Skip to content

BLD: bump OpenBLAS version, use OpenBLAS for win-arm64 #29039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mattip
Copy link
Member

@mattip mattip commented May 23, 2025

Closes #29035 by adding openblas support to the arm64 windows builds

This bumps the version of OpenBLAS from the 0.3.29 release to the latest develop HEAD (which had wheels for OpenBLAS on win-arm64), so it may impact other things.

@github-actions github-actions bot added the 36 - Build Build related PR label May 23, 2025
@mattip
Copy link
Member Author

mattip commented May 25, 2025

Hmm. Wheel builder is failing to upload arm64 windows wheels since there is no micromamba

@mattip
Copy link
Member Author

mattip commented May 25, 2025

Ahh, can't win-arm64 run x86 code using emulation? I wonder if the wheel uploader could just use the win-x86_64 micromamba install

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow that is a problem with this PR, so approving: merge when you think it's done.

@charris
Copy link
Member

charris commented May 25, 2025

Wheel builder is failing to upload arm64 windows wheels since there is no micromamba

I have the same problem uploading the numpy win_arm64 wheels for the 2.3.0rc1 release.

@charris
Copy link
Member

charris commented May 25, 2025

I've grabbed the win_arm wheels on github and will try using those.

@matthew-brett
Copy link
Contributor

@charris, @mattip - could you use my fix for the OpenBLAS wheel uploading for WoA? I installed anaconda-client via pip instead of using MicroMamba:

https://github.com/MacPython/openblas-libs/blob/main/.github/workflows/windows-arm.yml#L82

@mattip
Copy link
Member Author

mattip commented May 25, 2025

Thanks @matthew-brett. I used the pip-install trick and also disallowed building a win-arm64 wheel without OpenBLAS. Please check the wheel building logs and/or the artifact to make sure the win-arm64 wheel is using OpenBLAS before merging.

@mattip
Copy link
Member Author

mattip commented May 25, 2025

CirrusCI macOS-arm64 builds are failing, There is this messsage, then another #28227 heisenbug failure

Only [ghcr.io/cirruslabs/macos-runner:sonoma, ghcr.io/cirruslabs/macos-runner:sequoia] is allowed. Automatically upgraded to ghcr.io/cirruslabs/macos-runner:sequoia.

@mattip mattip force-pushed the openblas-win-arm64 branch from b277e71 to 6c8f0f7 Compare May 25, 2025 18:26
@mattip
Copy link
Member Author

mattip commented May 26, 2025

The repair-wheel-command is not running. In the x86_64 logs I see for instance

Successfully built numpy-2.4.0.dev0-cp312-cp312-win_amd64.whl
##[endgroup]
##[group]Repairing wheel...

but in the arm64 run the build goes right to testing

Successfully built numpy-2.4.0.dev0-cp311-cp311-win_arm64.whl
##[endgroup]
##[group]Testing wheel...

The repair-wheel-command is inside a [tool.cibuildwheel.windwows] section in the pyproject.toml. Is a different selector needed for win-arm64?

numpy/pyproject.toml

Lines 180 to 184 in 3c995e7

[tool.cibuildwheel.windows]
# This does not work, use CIBW_ENVIRONMENT_WINDOWS
environment = {PKG_CONFIG_PATH="./.openblas"}
config-settings = "setup-args=--vsenv setup-args=-Dallow-noblas=false build-dir=build"
repair-wheel-command = "bash -el ./tools/wheels/repair_windows.sh {wheel} {dest_dir}"

@joerick
Copy link

joerick commented May 26, 2025

That's the right selector, but the build options output at the start of a build shows an override: https://github.com/numpy/numpy/actions/runs/15246740791/job/42874812888#step:8:8542

That's just a bit further down the file:

numpy/pyproject.toml

Lines 191 to 195 in f3edb9f

[[tool.cibuildwheel.overrides]]
select = "*-win_arm64"
config-settings = "setup-args=--vsenv setup-args=-Dallow-noblas=true build-dir=build"
repair-wheel-command = ""

@mattip
Copy link
Member Author

mattip commented May 26, 2025

Ahh, thanks I missed that.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label May 26, 2025
@charris charris added this to the 2.3.0 release milestone May 26, 2025
@mattip
Copy link
Member Author

mattip commented May 26, 2025

Wheel repairing is running but...

C:\mingw64\bin\strip.exe: ./numpy/fft/_pocketfft_umath.cp311-win_arm64.pyd: file format not recognized

Is this using a x86_64 ming64 installation?

@mattip
Copy link
Member Author

mattip commented May 26, 2025

Since delvewheel updated the way it mangles, maybe there is no more need to strip the pyds?

@mattip
Copy link
Member Author

mattip commented May 26, 2025

Cool. OpenBLAS is properly mangled in both windows platforms. Only the cp313t-win32 wheel testing failed with worker 'gw0' crashed while running '_core/tests/test_arrayprint.py::test_multithreaded_array_printing'. Is that a known thing?

The windows-arm64 wheels weigh in at about 9.5MB, where the windows-x86_64 ones are about 13MB.

Anyone want to take a look?

@matthew-brett
Copy link
Contributor

Installed and imported OK. A couple of test failures.

============================================== FAILURES ===============================================
___________________________ TestComplexFunctions.test_branch_cuts_complex64 ___________________________ 

self = <test_umath.TestComplexFunctions object at 0x0000029C0E2DF650>

    @pytest.mark.xfail(IS_WASM, reason="doesn't work")
    def test_branch_cuts_complex64(self):
        # check branch cuts and continuity on them
        _check_branch_cut(np.log,   -0.5, 1j, 1, -1, True, np.complex64)  # noqa: E221
        _check_branch_cut(np.log2,  -0.5, 1j, 1, -1, True, np.complex64)  # noqa: E221
        _check_branch_cut(np.log10, -0.5, 1j, 1, -1, True, np.complex64)
        _check_branch_cut(np.log1p, -1.5, 1j, 1, -1, True, np.complex64)
        _check_branch_cut(np.sqrt,  -0.5, 1j, 1, -1, True, np.complex64)  # noqa: E221

        _check_branch_cut(np.arcsin, [ -2, 2],   [1j, 1j], 1, -1, True, np.complex64)
        _check_branch_cut(np.arccos, [ -2, 2],   [1j, 1j], 1, -1, True, np.complex64)
>       _check_branch_cut(np.arctan, [0 - 2j, 2j],  [1,  1], -1, 1, True, np.complex64)

self       = <test_umath.TestComplexFunctions object at 0x0000029C0E2DF650>

envs\py312\Lib\site-packages\numpy\_core\tests\test_umath.py:4295:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

f = <ufunc 'arctan'>, x0 = array([0.-2.j, 0.+2.j], dtype=complex64)
dx = array([1.+0.j, 1.+0.j], dtype=complex64), re_sign = -1, im_sign = 1, sig_zero_ok = True
dtype = <class 'numpy.complex64'>

    def _check_branch_cut(f, x0, dx, re_sign=1, im_sign=-1, sig_zero_ok=False,
                          dtype=complex):
        """
        Check for a branch cut in a function.

        Assert that `x0` lies on a branch cut of function `f` and `f` is
        continuous from the direction `dx`.

        Parameters
        ----------
        f : func
            Function to check
        x0 : array-like
            Point on branch cut
        dx : array-like
            Direction to check continuity in
        re_sign, im_sign : {1, -1}
            Change of sign of the real or imaginary part expected
        sig_zero_ok : bool
            Whether to check if the branch cut respects signed zero (if applicable)
        dtype : dtype
            Dtype to check (should be complex)

        """
        x0 = np.atleast_1d(x0).astype(dtype)
        dx = np.atleast_1d(dx).astype(dtype)

        if np.dtype(dtype).char == 'F':
            scale = np.finfo(dtype).eps * 1e2
            atol = np.float32(1e-2)
        else:
            scale = np.finfo(dtype).eps * 1e3
            atol = 1e-4

        y0 = f(x0)
        yp = f(x0 + dx * scale * np.absolute(x0) / np.absolute(dx))
        ym = f(x0 - dx * scale * np.absolute(x0) / np.absolute(dx))

        assert_(np.all(np.absolute(y0.real - yp.real) < atol), (y0, yp))
        assert_(np.all(np.absolute(y0.imag - yp.imag) < atol), (y0, yp))
>       assert_(np.all(np.absolute(y0.real - ym.real * re_sign) < atol), (y0, ym))
E       AssertionError: (array([-1.3112233-0.23887786j,  1.3112233+0.23887786j], dtype=complex64), array([-1.3112233-0.23887786j,  1.3112233+0.23887786j], dtype=complex64))

atol       = np.float32(0.01)
dtype      = <class 'numpy.complex64'>
dx         = array([1.+0.j, 1.+0.j], dtype=complex64)
f          = <ufunc 'arctan'>
im_sign    = 1
re_sign    = -1
scale      = np.float32(1.1920929e-05)
sig_zero_ok = True
x0         = array([0.-2.j, 0.+2.j], dtype=complex64)
y0         = array([-1.3112233-0.23887786j,  1.3112233+0.23887786j], dtype=complex64)
ym         = array([-1.3112233-0.23887786j,  1.3112233+0.23887786j], dtype=complex64)
yp         = array([-1.3112233-0.23887786j,  1.3112233+0.23887786j], dtype=complex64)

envs\py312\Lib\site-packages\numpy\_core\tests\test_umath.py:4540: AssertionError
_______________________ TestComplexFunctions.test_loss_of_precision[complex64] ________________________

self = <test_umath.TestComplexFunctions object at 0x0000029C0E2DFB90>, dtype = <class 'numpy.complex64'>

    @pytest.mark.xfail(
        # manylinux2014 uses glibc2.17
        _glibc_older_than("2.18"),
        reason="Older glibc versions are imprecise (maybe passes with SIMD?)"
    )
    @pytest.mark.xfail(IS_WASM, reason="doesn't work")
    @pytest.mark.parametrize('dtype', [
        np.complex64, np.complex128, np.clongdouble
    ])
    def test_loss_of_precision(self, dtype):
        """Check loss of precision in complex arc* functions"""
        if dtype is np.clongdouble and platform.machine() != 'x86_64':
            # Failures on musllinux, aarch64, s390x, ppc64le (see gh-17554)
            pytest.skip('Only works reliably for x86-64 and recent glibc')

        # Check against known-good functions

        info = np.finfo(dtype)
        real_dtype = dtype(0.).real.dtype
        eps = info.eps

        def check(x, rtol):
            x = x.astype(real_dtype)

            z = x.astype(dtype)
            d = np.absolute(np.arcsinh(x) / np.arcsinh(z).real - 1)
            assert_(np.all(d < rtol), (np.argmax(d), x[np.argmax(d)], d.max(),
                                      'arcsinh'))

            z = (1j * x).astype(dtype)
            d = np.absolute(np.arcsinh(x) / np.arcsin(z).imag - 1)
            assert_(np.all(d < rtol), (np.argmax(d), x[np.argmax(d)], d.max(),
                                      'arcsin'))

            z = x.astype(dtype)
            d = np.absolute(np.arctanh(x) / np.arctanh(z).real - 1)
            assert_(np.all(d < rtol), (np.argmax(d), x[np.argmax(d)], d.max(),
                                      'arctanh'))

            z = (1j * x).astype(dtype)
            d = np.absolute(np.arctanh(x) / np.arctan(z).imag - 1)
            assert_(np.all(d < rtol), (np.argmax(d), x[np.argmax(d)], d.max(),
                                      'arctan'))

        # The switchover was chosen as 1e-3; hence there can be up to
        # ~eps/1e-3 of relative cancellation error before it

        x_series = np.logspace(-20, -3.001, 200)
        x_basic = np.logspace(-2.999, 0, 10, endpoint=False)

        if dtype is np.clongdouble:
            if bad_arcsinh():
                pytest.skip("Trig functions of np.clongdouble values known "
                            "to be inaccurate on aarch64 and PPC for some "
                            "compilation configurations.")
            # It's not guaranteed that the system-provided arc functions
            # are accurate down to a few epsilons. (Eg. on Linux 64-bit)
            # So, give more leeway for long complex tests here:
            check(x_series, 50.0 * eps)
        else:
>           check(x_series, 2.1 * eps)

check      = <function TestComplexFunctions.test_loss_of_precision.<locals>.check at 0x0000029C2D9E4A40>
dtype      = <class 'numpy.complex64'>
eps        = np.float32(1.1920929e-07)
info       = finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)
real_dtype = dtype('float32')
self       = <test_umath.TestComplexFunctions object at 0x0000029C0E2DFB90>
x_basic    = array([0.00100231, 0.0019994 , 0.00398841, 0.0079561 , 0.01587084,
       0.0316592 , 0.06315387, 0.12597953, 0.25130435, 0.50130265])
x_series   = array([1.00000000e-20, 1.21736864e-20, 1.48198641e-20, 1.80412378e-20,
       2.19628372e-20, 2.67368693e-20, 3.254862...3.06526013e-04, 3.73155156e-04, 4.54267386e-04,       
       5.53010871e-04, 6.73218092e-04, 8.19554595e-04, 9.97700064e-04])

envs\py312\Lib\site-packages\numpy\_core\tests\test_umath.py:4392:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = array([9.99999968e-21, 1.21736865e-20, 1.48198648e-20, 1.80412373e-20,
       2.19628376e-20, 2.67368701e-20, 3.254862...55170e-04, 4.54267400e-04,
       5.53010846e-04, 6.73218106e-04, 8.19554611e-04, 9.97700030e-04],
      dtype=float32)
rtol = np.float32(2.503395e-07)

    def check(x, rtol):
        x = x.astype(real_dtype)

        z = x.astype(dtype)
        d = np.absolute(np.arcsinh(x) / np.arcsinh(z).real - 1)
        assert_(np.all(d < rtol), (np.argmax(d), x[np.argmax(d)], d.max(),
                                  'arcsinh'))

        z = (1j * x).astype(dtype)
        d = np.absolute(np.arcsinh(x) / np.arcsin(z).imag - 1)
>       assert_(np.all(d < rtol), (np.argmax(d), x[np.argmax(d)], d.max(),
                                  'arcsin'))
E       AssertionError: (np.int64(198), np.float32(0.0008195546), np.float32(3.5762787e-07), 'arcsin')  

d          = array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
       0.0000000e+00, 0.0000000e+00, 0.0000000e+00,... 0.0000000e+00, 5.9604645e-08,
       1.1920929e-07, 2.3841858e-07, 3.5762787e-07, 3.5762787e-07],
      dtype=float32)
dtype      = <class 'numpy.complex64'>
real_dtype = dtype('float32')
rtol       = np.float32(2.503395e-07)
x          = array([9.99999968e-21, 1.21736865e-20, 1.48198648e-20, 1.80412373e-20,
       2.19628376e-20, 2.67368701e-20, 3.254862...55170e-04, 4.54267400e-04,
       5.53010846e-04, 6.73218106e-04, 8.19554611e-04, 9.97700030e-04],
      dtype=float32)
z          = array([0.+9.99999968e-21j, 0.+1.21736865e-20j, 0.+1.48198648e-20j,
       0.+1.80412373e-20j, 0.+2.19628376e-20j, 0.+2...54267400e-04j, 0.+5.53010846e-04j, 0.+6.73218106e-04j,
       0.+8.19554611e-04j, 0.+9.97700030e-04j], dtype=complex64)

envs\py312\Lib\site-packages\numpy\_core\tests\test_umath.py:4363: AssertionError
======================================= short test summary info =======================================
FAILED envs/py312/Lib/site-packages/numpy/_core/tests/test_umath.py::TestComplexFunctions::test_branch_cuts_complex64 - AssertionError: (array([-1.3112233-0.23887786j,  1.3112233+0.23887786j], dtype=complex64), array([-...
FAILED envs/py312/Lib/site-packages/numpy/_core/tests/test_umath.py::TestComplexFunctions::test_loss_of_precision[complex64] - AssertionError: (np.int64(198), np.float32(0.0008195546), np.float32(3.5762787e-07), 'arcsin')
2 failed, 43460 passed, 1142 skipped, 2644 deselected, 28 xfailed, 5 xpassed in 80.00s (0:01:19)

@andyfaff
Copy link
Member

CirrusCI macOS-arm64 builds are failing, There is this messsage, then another #28227 heisenbug failure

Only [ghcr.io/cirruslabs/macos-runner:sonoma, ghcr.io/cirruslabs/macos-runner:sequoia] is allowed. Automatically upgraded to ghcr.io/cirruslabs/macos-runner:sequoia.

I think the monterey image is no longer available, so it may be time to update it to Sonoma/Sequoia.

@mattip
Copy link
Member Author

mattip commented May 27, 2025

@matthew-brett are those failures new to the wheels-with-openblas or do they occur also in the wheel artifacts from, say, this CI run? If the latter, maybe that should be part of a different PR to add win-arm64 testing and blocklist some trig functions?

I was too dismissive of the cirrus failure. I see it is not only the heisenbug, there are 47!! failures when using OpenBLAS with macos-arm64 (targeting macos_11 without accelerate). I don't know whether this is due to the automatic update to a newer macos version, or due to the newer OpenBLAS version. I don't see any issues in upstream OpenBLAS that might be relevant.I opened #29061 to change only the macOS version, let's see if the build passes there.

@mattip mattip force-pushed the openblas-win-arm64 branch from fd2732c to 5bc41aa Compare May 27, 2025 05:54
@mattip
Copy link
Member Author

mattip commented May 27, 2025

Rebased off main to get the cirrus-ci update to sonoma, which passed CI. Let's see if the OpenBLAS update is the problem or the further macos update to sequoia is the problem.

@mattip
Copy link
Member Author

mattip commented May 27, 2025

All that changed is the OpenBLAS version, and instead of passing there are now there are 43 failures 😞. @martin-frbg does the failures on macos-arm64 when moving from 0.3.29 to the latest develop HEAD ring any bells? I checked taht both builds are using the scipy_openblas64 ilp64 interfaces. From what I can see, the failures are in float32 when using matmul or power-like operations. For instance here is the failure from test_ufunc_noncontiguous[matmul]:

E                    ACTUAL: array([[[ 1.,  2.,  3.,  4.,  5.,  6.],
E                           [ 7.,  8.,  9., 10., 11., 12.],
E                           [13., 14., 15., 16., 17., 18.],...
E                    DESIRED: array([[[7.812911e-03, 2.000110e+00, 3.200184e+01, 5.120308e+02,
E                            2.048128e+03, 8.192533e+03],
E                           [3.277197e+04, 1.310888e+05, 2.621793e+05, 5.243622e+05,...

@martin-frbg
Copy link

No idea, there have been way too many changes since 0.3.29 but not that many that would affect OSX/Arm64 (I assume you are building for the "VORTEX" Apple M target there, which is mostly NeoverseN1 kernels ?) and none that manifest themselves as OpenBLAS errors

@mattip
Copy link
Member Author

mattip commented May 27, 2025

I cannot reproduce the failures locally on a macbook M2 using Sequoia 15.4.1. I wonder what I am missing.

python3.12 -m venv /tmp/venv312
source /tmp/venv312/bin/activate
pip install cibuildwheel
export CIBW_BUILD="cp312*"
export CIBW_ARCH=arm64
export INSTALL_OPENBLAS=true
export CIBW_ENVIRONMENT_MACOS="MACOSX_DEPLOYMENT_TARGET='11.0' INSTALL_OPENBLAS=true RUNNER_OS=macOS PKG_CONFIG_PATH=$PWD/.openblas"
cibuildwheel
# wheel builds and tests without failure

@mattip
Copy link
Member Author

mattip commented May 27, 2025

I assume you are building for the "VORTEX" Apple M target there

The config string is OpenBLAS 0.3.29.dev USE64BITINT DYNAMIC_ARCH NO_AFFINITY neoversen1 MAX_THREADS=64, which is identical to the run before this PR (just with OpenBLAS 0.3.29, no dev

@martin-frbg
Copy link

I don't see this on the M1 in the GCC Compile farm either (targeted build, will try a dynamic_arch build too but don't expect that to be different). Can try with my M4 later today but don't see why that would be different except that it probably has a newer OS

@mattip
Copy link
Member Author

mattip commented May 27, 2025

Grasping at straws: maybe this is due to some quirk in the VM hosting at Cirrus and the number of parallel test runners (currently set to -n auto in the testing script). I am playing with this in #29069

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
09 - Backport-Candidate PRs tagged should be backported 36 - Build Build related PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BLD: use OpenBLAS in the windows-arm64 build
7 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy