Skip to content

ENH: Improve performance of np.linalg._linalg._commonType #28686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

We improve performance of _commonType which benefits np.linalg.det and several other methods for small size arrays.

  • We return the output of isComplexType so that the value can be reused by the calling methods.
  • We pass the dtype instead of the arrays we can use a cache.

Benchmark:

np.linalg.det(x22): Mean +- std dev: [main_commontype] 3.38 us +- 0.15 us -> [pr_commontype] 2.90 us +- 0.17 us: 1.17x faster
np.linalg.det(x33): Mean +- std dev: [main_commontype] 3.57 us +- 0.28 us -> [pr_commontype] 3.01 us +- 0.23 us: 1.18x faster
np.linalg.inv(x22): Mean +- std dev: [main_commontype] 5.81 us +- 0.48 us -> [pr_commontype] 5.32 us +- 0.30 us: 1.09x faster
np.linalg.inv(x33): Mean +- std dev: [main_commontype] 6.83 us +- 0.49 us -> [pr_commontype] 6.14 us +- 0.34 us: 1.11x faster
np.linalg.eig(x22): Mean +- std dev: [main_commontype] 25.5 us +- 1.3 us -> [pr_commontype] 24.6 us +- 1.1 us: 1.04x faster

Geometric mean: 1.12x faster
Test script
# /// script
# requires-python = ">=3.10"
# dependencies = ['numpy', 'pyperf']
# ///

import pyperf

setup = """
import numpy as np
x22 = np.arange(4.).reshape( (2,2) ) + np.eye(2)
x33 = np.arange(9.).reshape( (3,3) ) + np.eye(3)
"""

runner = pyperf.Runner()
runner.timeit(name="np.linalg.det(x22)", stmt="np.linalg.det(x22)", setup=setup)
runner.timeit(name="np.linalg.det(x33)", stmt="np.linalg.det(x33)", setup=setup)
runner.timeit(name="np.linalg.inv(x22)", stmt="np.linalg.inv(x22)", setup=setup)
runner.timeit(name="np.linalg.inv(x33)", stmt="np.linalg.inv(x33)", setup=setup)
runner.timeit(name="np.linalg.eig(x22)", stmt="np.linalg.eig(x22)", setup=setup)

@eendebakpt eendebakpt changed the title ENH: Improve peformance of np.linalg._linalg._commonType ENH: Improve performance of np.linalg._linalg._commonType Apr 10, 2025
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eendebakpt - I'm wondering if this isn't just adding complexity for something where it might pay to just look better at what is being done. E.g., I'm rather confused why one cannot use promote_types and cut this short a bit. A quick test on det shows that the following passes all tests,

diff --git a/numpy/linalg/_linalg.py b/numpy/linalg/_linalg.py
index e181e1a5d8..67e36debb8 100644
--- a/numpy/linalg/_linalg.py
+++ b/numpy/linalg/_linalg.py
@@ -31,7 +31,7 @@
     reciprocal, overrides, diagonal as _core_diagonal, trace as _core_trace,
     cross as _core_cross, outer as _core_outer, tensordot as _core_tensordot,
     matmul as _core_matmul, matrix_transpose as _core_matrix_transpose,
-    transpose as _core_transpose, vecdot as _core_vecdot,
+    promote_types, transpose as _core_transpose, vecdot as _core_vecdot,
 )
 from numpy._globals import _NoValue
 from numpy.lib._twodim_base_impl import triu, eye
@@ -2367,10 +2367,9 @@ def det(a):
     """
     a = asarray(a)
     _assert_stacked_square(a)
-    t, result_t = _commonType(a)
-    signature = 'D->D' if isComplexType(t) else 'd->d'
-    r = _umath_linalg.det(a, signature=signature)
-    r = r.astype(result_t, copy=False)
+    r = _umath_linalg.det(a, dtype=promote_types(a.dtype, double))
+    if r.dtype != a.dtype:
+        r = r.astype(promote_types(a.dtype, single), copy=False)
     return r

With that, the test from your script on a 2x2 matrix

x22 = np.arange(4.).reshape( (2,2) ) + np.eye(2)
%timeit np.linalg.det(x22)
7.09 -> 4.48 us

The only possible downside is that this does not raise an error on f2 input -- but why should it anyway?

p.s. For det at least, there is no need for _assert_stacked_square either - the gufunc will already check that there are at least 2 dimensions and that the last two are equal.

@jorenham
Copy link
Member

How about something like this:

_DTYPE_RANK = dict(zip(map(dtype, "fdFD"), range(4)))

max_rank = -1
for dtype in dtypes:
    if dtype.num < 11:  # <: integer | bool
        continue
    if (rank := _DTYPE_RANK.get(dtype)) is None:
        raise TypeError(...)
    if rank == 3:  # no need to go on
        return cdouble, cdouble
    if rank > max_rank:
        max_rank = rank

if max_rank > 1:
    return cdouble, (csingle, cdouble)[max_rank - 2]
else:
    return double, (single, double)[max_rank]

I didn't test it, but I expect this to be quite a bit faster (and it might even be correct, too). Anyway, even if not correct, I'm sure you get the idea.

@mhvk
Copy link
Contributor

mhvk commented Apr 11, 2025

Ideally, we don't rely on implementation details like type numbers... Also, no real reason to exclude user dtypes that know how to convert to double, etc. Using promote_types puts the burden where it belongs (and is in C, so pretty fast).

@eendebakpt
Copy link
Contributor Author

The promote_types approach is a bit faster than the PR here, so I will look into that option.

The main performance gain is from the if r.dtype != a.dtype: check. I want to see whether we can handle that inside the astype.

@eendebakpt
Copy link
Contributor Author

@mhvk To avoid the copy on the scalar we can also check in astype whether a copy is needed. A prototype for this is:

main...eendebakpt:numpy:astype

The advantage over the if r.dtype != a.dtype check is that it makes calls to scalar.astype faster (if no conversion is needed), also in other cases. The disadvantage is this adds some more complexity at the C side (and a very minor slowdown for the path where a conversion is needed).

At this moment I have no strong preference for either options, so any arguments in either direction are welcome.

@mhvk
Copy link
Contributor

mhvk commented Apr 12, 2025

@eendebakpt - I think your patch makes sense in principle, but is perhaps a bit orthogonal to the goals here? At least, I wrote the if statement mostly to avoid the second call to promote_types.

@eendebakpt
Copy link
Contributor Author

@mhvk Your patch looks good, I might end up refactoring this PR in that way.

It would be nice to also refactor the other methods calling _commonType in the same style. The np.promote_types does only handle 2 arguments (we need 3 some some methods). The np.result_type handles any number of arguments, but is quite a bit slower. I opened #28710 to improve performance, but it does not come close the promote_types performance (mainly due to dispatcher overhead).

@eendebakpt
Copy link
Contributor Author

p.s. For det at least, there is no need for _assert_stacked_square either - the gufunc will already check that there are at least 2 dimensions and that the last two are equal.

True, but _assert_stacked_square raises a LinAlgError and the gufunc a ValueError. So removing that check might lead to some backwards compatibility issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy