Skip to content

Avoid copy in matrix-vector dots with negative strides? #28909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ricardoV94 opened this issue May 6, 2025 · 3 comments
Open

Avoid copy in matrix-vector dots with negative strides? #28909

ricardoV94 opened this issue May 6, 2025 · 3 comments

Comments

@ricardoV94
Copy link

ricardoV94 commented May 6, 2025

The following is ~5x slower when A has negative strides:

import numpy as np

A = np.zeros((512, 512))
x = np.zeros((512))
%timeit np.dot(A, x)  # 49.2 μs ± 19.2 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
A_flipped = A[::-1]
%timeit np.dot(A_flipped, x)  # 241 μs ± 37.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

I suspect it's due to a copy before calling a blas GEMV as it doesn't allow negative strides. However, if this is the case, it is possible to just tell blas to iterate over A with a positive LDA but negative inc_y, to obtain the same results without having to perform a copy.

import numpy as np
from scipy.linalg.blas import dgemv

A = np.arange(9, dtype="float64").reshape((3, 3))
x = np.ones((3,))

y1 = np.empty(A.shape[0])
dgemv(1.0, A[::-1], x, 0.0, y1, overwrite_y=True)
y2 = np.empty(A.shape[0])
dgemv(1.0, A, x, 0.0, y2, incy=-1, overwrite_y=True)
np.testing.assert_allclose(y1, y2)

If the columns have negative strides, one needs to iterate in reverse over x as well

I don't know if this is easy/worth doing in numpy, just wanted to share.

@ricardoV94 ricardoV94 changed the title Numpy matrix-vector dot with negative strides Avoid copy in matrix-vector dots with negative strides? May 6, 2025
@niranjanorkat
Copy link

niranjanorkat commented May 9, 2025

I was inspecting cblas_matrixproduct in cblasfuncs.c (numpy/numpy/blob/main/numpy/core/src/common/cblasfuncs.c), which handles matrix multiplication.

Currently, the function checks for negative strides, broadcasts, and possible misalignment via _bad_strides(). If any of these conditions are met, it performs a full copy of either a or b at the beginning of the function to ensure compatibility with BLAS.

An enhancement here could involve extending _bad_strides() to return more descriptive statuses (e.g., STRIDE_OK, STRIDE_NEGATIVE, etc.) rather than just a boolean. Then to add logic to avoid copies in specific cases — such as vector-vector, matrix-vector, or vector-matrix operations — by adjusting parameters lda, incX, and incY instead of copying the array.

As @ricardoV94 mentioned, it’s unclear whether the gain is worth the added complexity. Still, I wanted to get a second opinion — if this seems useful, I’d be happy to take it on.

cc: @eendebakpt

@eendebakpt
Copy link
Contributor

Making a copy of the array with negative strides is also quite expensive(%timeit A_flipped.copy()), so making a copy first and then performing the np.dot does not help.

@niranjanorkat I cannot judge whether this is useful or not. It depends on how much complexity the additional code adds (probably hard to judge without actually implementing), and how often this case of negative strides occurs in actual code.

@ricardoV94
Copy link
Author

and how often this case of negative strides occurs in actual code.

Negative strides are common for us when computing gradient of Cumsum/Convolution functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy