ENH: Use array indexing preparation routines for flatiter objects #28590

lysnikolaou · 2025-03-26T10:19:05Z

Use prepare_index in iter_subscript and iter_ass_subscript. This fixes various cases that were broken before:
- arr.flat[[True, True]]
- arr.flat[[1.0, 1.0]]
- arr.flat[()] = 0
Add more extensive tests for flatiter indexing operations

ngoldbaum · 2025-03-26T19:03:12Z

numpy/_core/tests/test_regression.py

-        assert_raises(ValueError, ia, x.flat, s, np.zeros(9, dtype=float))
-        assert_raises(ValueError, ia, x.flat, s, np.zeros(11, dtype=float))
+        assert_raises(IndexError, ia, x.flat, s, np.zeros(9, dtype=float))
+        assert_raises(IndexError, ia, x.flat, s, np.zeros(11, dtype=float))


While this is certainly more consistent and I'd even call it a bugfix, it is a behavior change and someone might have code relying on the old behavior. Needs a release note at least. You also need another release note for the new features.

Agreed. Added a release note that lists all the most important changes.

ngoldbaum · 2025-03-28T14:17:44Z

This is a big refactor, so I think we'll need at least two experienced developers to go over the C code changes, so that might take a while. I'll try to do a pass focusing on the correctness of the C code soon. On a first, high-level pass this looks like mostly simplification and cleanup.

I think you should also try running the indexing benchmarks to see if there are any significant regressions in existing benchmarks. I think bench_indexing already captures several workflows that go through the changed low-level C code path.

It would also be nice to get new entries in the FlatIterIndexing benchmark for newly added functionality.

lysnikolaou · 2025-03-28T16:45:26Z

These are the results of running the (old & new) benchmarks:

| Change   | Before [93898621] <main>   | After [cfcdabf0] <use-prepare-index-flatiter>   |     Ratio | Benchmark (Parameter)                                       |
|----------|----------------------------|-------------------------------------------------|-----------|-------------------------------------------------------------|
| +        | 87.0±0.4ns                 | 43.0±0.2ms                                      | 494276    | bench_indexing.FlatIterIndexing.time_flat_empty_tuple_index |
| +        | 115±3ns                    | 479±8ns                                         |      4.15 | bench_indexing.FlatIterIndexing.time_flat_bool_index_0d     |
| +        | 39.4±0.3ms                 | 42.8±0.3ms                                      |      1.09 | bench_indexing.FlatIterIndexing.time_flat_ellipsis_index    |
| +        | 3.95±0.04ms                | 4.29±0.07ms                                     |      1.09 | bench_indexing.FlatIterIndexing.time_flat_slice_index       |

It looks like having special cases for tuple, ellipses etc. (instead of going through prepare_index) did have an impact on performance. Should we try and keep those special cases in?

ngoldbaum · 2025-03-28T17:01:15Z

Should we try and keep those special cases in?

Probably

lysnikolaou · 2025-04-23T13:31:15Z

Should we try and keep those special cases in?

Probably

I added a couple of special cases for an empty tuple and boolean indexes. This fixes the two worst performance regressions. I feel that the rest are acceptable, since this goes through a much more complex code path to make sure that everything is set up correctly.

ngoldbaum · 2025-04-23T17:46:16Z

I added the 2.3 milestone to make sure we don't drop reviewing this before cutting the release.

charris · 2025-05-19T18:25:18Z

I added the 2.3 milestone

@ngoldbaum I am about to push this off to 2.4 unless you want to put it in very soon.

ngoldbaum · 2025-05-19T19:38:36Z

I spoke with Lysandros and he said it's OK to push this off. We'll coordinate on getting this reviewed soon.

seberg

Sorry for not looking at it much. Overall looks nice, I need to do a pass to see for refcount issues, etc.

I am slightly worried that some of the bad bool cases should maybe have a FutureWarning (or just go to an error for a bit?!), to enforce correct behavior.

Overall, I am happy that this seemed to have worked well to integrate, the diff is a bit unwieldy, but it can't be helped.

seberg · 2025-06-12T12:54:26Z

doc/release/upcoming_changes/28590.improvement.rst

+
+  ``arr.flat[[True, True]]`` and ``arr.flat[[1.0, 1.0]]`` were incorrectly
+  treated as ``arr.flat[[1, 1]]``. They now raise an `IndexError`` (unless
+  ``arr.flat[[True, Truee]]`` is a valid boolean index)


I think I am OK with this, but it is technically a too fast change.

It could make sense to put a FutureWarning if the input isn't already an array, the way to avoid it, would be to make sure the input is an array.
(I would also be happy to just go with a hard error and a warning it will work in the future, to not bother keeping the old stuff working, heh)

Basically it seems extremely niche, but has the potentially to modify code results.

seberg · 2025-06-12T12:56:20Z

doc/release/upcoming_changes/28590.improvement.rst

+* Fixed crash when assigning to an empty index tuple:
+
+  ``arr.flat[()] = 0`` previously crashed the Python interpreter. It now
+  correctly assigns to the entire array.


I might prefer if this just errors, it seems weird to do this for something that is known to be exactly 1-D to allow a 0-D index.
(In an ideal world, I might prefer if NumPy forced you to add ... for incomplete indices, but it is just too much of a change.)

But I don't feel strongly about it.

seberg · 2025-06-12T13:04:30Z