-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: in-place fixed-width string multiply doesn't do overflow checking #29011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I also came across this here: https://github.com/numpy/numpy/actions/runs/15142188944/job/42568906722 I don't have a Mac and don't know where to begin debugging this, @jorenham any ideas? |
This comment has been minimized.
This comment has been minimized.
the segfault happens at line 17 here: |
I wonder if we should move the mypy tests to a different runtime: linux, newer image, ... |
we're running mypy twice on python 3.11 now (including the mac one), so upgrading one of those makes sense that way I suppose |
Here is another, but in masked arrays:https://github.com/numpy/numpy/actions/runs/15168590479/job/42652733379?pr=29018 |
I can reproduce this locally very rarely, before the first success I had installed Below is a That gave me the following:
I am trying to reproduce the same in lldb with @ngoldbaum I was wondering if you have a sanitizer setup ready that may find something quickly? EDIT: And yeah, where this triggers seems very random, I saw at least once during polynomial imports also. test.py
|
I got a backtrace from lldb after a dozen tries. Can't say it looks enlightening, the issue is a malloc triggered by an array creation. But I still feel the problem is probably much earlier? Backtrace below (This was without the
|
Here's an ASAN report:
So this is an issue in the string ufuncs, of all places! |
Also this happens while the script is processing |
Here's the Python traceback from faulthandler:
|
I don't know if it is related, but gcc 15.1.1 warns
|
The problem is that the ufunc needs to sanity check that the |
ping @lysnikolaou - any chance you have bandwidth to look at this? |
That is fixed by #28985. |
Retitled now that we've found the root cause. You can trigger the same issue with this script:
This doesn't heap-overflow:
So maybe we're doing some bound checking elsewhere and just missing it in multiply... I can add bounds checking to multiply doing something like this: diff --git a/numpy/_core/src/umath/string_buffer.h b/numpy/_core/src/umath/string_buffer.h
index 554f9ece51..725b789c4e 100644
--- a/numpy/_core/src/umath/string_buffer.h
+++ b/numpy/_core/src/umath/string_buffer.h
@@ -297,6 +297,18 @@ struct Buffer {
return num_codepoints;
}
+ inline size_t
+ buffer_width()
+ {
+ switch (enc) {
+ case ENCODING::ASCII:
+ case ENCODING::UTF8:
+ return after - buf;
+ case ENCODING::UTF32:
+ return (after - buf) / sizeof(npy_ucs4);
+ }
+ }
+
inline Buffer<enc>&
operator+=(npy_int64 rhs)
{
@@ -396,7 +408,7 @@ struct Buffer {
case ENCODING::ASCII:
case ENCODING::UTF8:
// for UTF8 we treat n_chars as number of bytes
- memcpy(other.buf, buf, len);
+ memcpy(other.buf, buf, len);
break;
case ENCODING::UTF32:
memcpy(other.buf, buf, len * sizeof(npy_ucs4));
diff --git a/numpy/_core/src/umath/string_ufuncs.cpp b/numpy/_core/src/umath/string_ufuncs.cpp
index 5b4b67cda6..c6f319f746 100644
--- a/numpy/_core/src/umath/string_ufuncs.cpp
+++ b/numpy/_core/src/umath/string_ufuncs.cpp
@@ -176,13 +176,26 @@ string_multiply(Buffer<enc> buf1, npy_int64 reps, Buffer<enc> out)
}
if (len1 == 1) {
+ size_t width = out.buffer_width();
+ if (width < reps) {
+ reps = width;
+ }
out.buffer_memset(*buf1, reps);
out.buffer_fill_with_zeros_after_index(reps);
}
else {
+ size_t filled = 0;
+ size_t width;
for (npy_int64 i = 0; i < reps; i++) {
+ width = out.buffer_width();
+ if (width < (filled + len1)) {
+ buf1.buffer_memcpy(out, width);
+ out += width;
+ break;
+ }
buf1.buffer_memcpy(out, len1);
out += len1;
+ filled += len1;
}
out.buffer_fill_with_zeros_after_index(0);
} I guess we only need to operate the arithmetic operators? Is there anything problematic besides multiply and add? |
I doubt it, but the thing to look for would be to check if the |
I doubt that many will use in-place string multiplication, as most users will probably be using it in similar ways as |
I opened #29060 |
See https://github.com/numpy/numpy/actions/runs/15138432787/job/42555808050?pr=29007
The text was updated successfully, but these errors were encountered: