Skip to content

ENH: Use array indexing preparation routines for flatiter objects #28590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

lysnikolaou
Copy link
Member

  • Use prepare_index in iter_subscript and iter_ass_subscript. This fixes various cases that were broken before:
    • arr.flat[[True, True]]
    • arr.flat[[1.0, 1.0]]
    • arr.flat[()] = 0
  • Add more extensive tests for flatiter indexing operations

Closes #28314.

@lysnikolaou lysnikolaou changed the title [ENH] Use array indexing preparation routines for flatiter objects ENH: Use array indexing preparation routines for flatiter objects Mar 26, 2025
@lysnikolaou lysnikolaou force-pushed the use-prepare-index-flatiter branch from 198df6b to 75aaed0 Compare March 26, 2025 10:23
@lysnikolaou lysnikolaou force-pushed the use-prepare-index-flatiter branch from 75aaed0 to 9f2d51f Compare March 26, 2025 10:30
assert_raises(ValueError, ia, x.flat, s, np.zeros(9, dtype=float))
assert_raises(ValueError, ia, x.flat, s, np.zeros(11, dtype=float))
assert_raises(IndexError, ia, x.flat, s, np.zeros(9, dtype=float))
assert_raises(IndexError, ia, x.flat, s, np.zeros(11, dtype=float))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is certainly more consistent and I'd even call it a bugfix, it is a behavior change and someone might have code relying on the old behavior. Needs a release note at least. You also need another release note for the new features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Added a release note that lists all the most important changes.

@lysnikolaou lysnikolaou force-pushed the use-prepare-index-flatiter branch from 8f2b322 to 33109dd Compare March 28, 2025 13:52
@ngoldbaum
Copy link
Member

This is a big refactor, so I think we'll need at least two experienced developers to go over the C code changes, so that might take a while. I'll try to do a pass focusing on the correctness of the C code soon. On a first, high-level pass this looks like mostly simplification and cleanup.

I think you should also try running the indexing benchmarks to see if there are any significant regressions in existing benchmarks. I think bench_indexing already captures several workflows that go through the changed low-level C code path.

It would also be nice to get new entries in the FlatIterIndexing benchmark for newly added functionality.

@ngoldbaum ngoldbaum self-assigned this Mar 28, 2025
@lysnikolaou
Copy link
Member Author

lysnikolaou commented Mar 28, 2025

These are the results of running the (old & new) benchmarks:

| Change   | Before [93898621] <main>   | After [cfcdabf0] <use-prepare-index-flatiter>   |     Ratio | Benchmark (Parameter)                                       |
|----------|----------------------------|-------------------------------------------------|-----------|-------------------------------------------------------------|
| +        | 87.0±0.4ns                 | 43.0±0.2ms                                      | 494276    | bench_indexing.FlatIterIndexing.time_flat_empty_tuple_index |
| +        | 115±3ns                    | 479±8ns                                         |      4.15 | bench_indexing.FlatIterIndexing.time_flat_bool_index_0d     |
| +        | 39.4±0.3ms                 | 42.8±0.3ms                                      |      1.09 | bench_indexing.FlatIterIndexing.time_flat_ellipsis_index    |
| +        | 3.95±0.04ms                | 4.29±0.07ms                                     |      1.09 | bench_indexing.FlatIterIndexing.time_flat_slice_index       |

It looks like having special cases for tuple, ellipses etc. (instead of going through prepare_index) did have an impact on performance. Should we try and keep those special cases in?

@ngoldbaum
Copy link
Member

Should we try and keep those special cases in?

Probably

@lysnikolaou
Copy link
Member Author

Should we try and keep those special cases in?

Probably

I added a couple of special cases for an empty tuple and boolean indexes. This fixes the two worst performance regressions. I feel that the rest are acceptable, since this goes through a much more complex code path to make sure that everything is set up correctly.

@ngoldbaum ngoldbaum added this to the 2.3.0 release milestone Apr 23, 2025
@ngoldbaum
Copy link
Member

I added the 2.3 milestone to make sure we don't drop reviewing this before cutting the release.

@charris
Copy link
Member

charris commented May 19, 2025

I added the 2.3 milestone

@ngoldbaum I am about to push this off to 2.4 unless you want to put it in very soon.

@ngoldbaum
Copy link
Member

I spoke with Lysandros and he said it's OK to push this off. We'll coordinate on getting this reviewed soon.

@charris charris modified the milestones: 2.3.0 release, 2.4.0 release May 19, 2025
@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jun 11, 2025
@seberg seberg self-requested a review June 11, 2025 17:49
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not looking at it much. Overall looks nice, I need to do a pass to see for refcount issues, etc.

I am slightly worried that some of the bad bool cases should maybe have a FutureWarning (or just go to an error for a bit?!), to enforce correct behavior.

Overall, I am happy that this seemed to have worked well to integrate, the diff is a bit unwieldy, but it can't be helped.


``arr.flat[[True, True]]`` and ``arr.flat[[1.0, 1.0]]`` were incorrectly
treated as ``arr.flat[[1, 1]]``. They now raise an `IndexError`` (unless
``arr.flat[[True, Truee]]`` is a valid boolean index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am OK with this, but it is technically a too fast change.

It could make sense to put a FutureWarning if the input isn't already an array, the way to avoid it, would be to make sure the input is an array.
(I would also be happy to just go with a hard error and a warning it will work in the future, to not bother keeping the old stuff working, heh)

Basically it seems extremely niche, but has the potentially to modify code results.

* Fixed crash when assigning to an empty index tuple:

``arr.flat[()] = 0`` previously crashed the Python interpreter. It now
correctly assigns to the entire array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might prefer if this just errors, it seems weird to do this for something that is known to be exactly 1-D to allow a 0-D index.
(In an ideal world, I might prefer if NumPy forced you to add ... for incomplete indices, but it is just too much of a change.)

But I don't feel strongly about it.

return obj;
if (PyTuple_Check(ind) && PyTuple_GET_SIZE(ind) == 0) {
Py_INCREF(self->ao);
return (PyObject *)self->ao;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving the check into the general path to not diverge here, but it's OK here also.
(I.e. I think the index info will tell us about indexing 0 dims with 0 indices.)

}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing about this branch is correct as testing a.flat[] against a.ravel()[] will tell you.

I could imagine just deprecating it, because it effectively indexes zero dimensions, similar to a.flat[()] there seems little reason to do so?

if (new == NULL) {
goto fail;
if (index_type == HAS_FANCY) {
ret = iter_subscript_int(self, (PyArrayObject *) indices[0].object, &cast_info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be an integer array here, I think. But I don't think it is guaranteed to be an intp array. A bit scary that no test seems to index with a differently sized integer, though?

Py_INCREF(type);
arrval = (PyArrayObject *)PyArray_FromAny(val, type, 0, 0,
Py_INCREF(dtype);
PyArrayObject *arrval = (PyArrayObject *)PyArray_FromAny(val, dtype, 0, 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to pass the correct maxdims here, IIRC (fixes corner cases around object arrays, even if the choice of how that behaves is a matter of taste).

}

/* Check for Integer or Slice */
if (PyLong_Check(ind) || PySlice_Check(ind)) {
start = parse_index_entry(ind, &step_size, &n_steps,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like parse_index_entry should be deleted, but is not yet.

indices_2d = np.array([[1, 2], [3, 4]])
assert_array_equal(a.flat[indices_2d], indices_2d)

assert_array_equal(a.flat[[True, 1]], a.flat[[1, 1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, would be good to have a basic test here (or it's own test) for e.g. int16 dtype inputs.

(And yes, you can force-cast to integer.)

def test_flatiter_indexing_boolean(self):
a = np.arange(9).reshape((3, 3))
a.flat[True] = 10
assert_array_equal(a, np.array([[10, 1, 2], [3, 4, 5], [6, 7, 8]]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically wrong. a.flat[True] should return the a.reshape(1, a.size) effectively. So if anything it would assign it to everything.

In practice, I may be tempted to just deprecate it, since it seems somewhat useless?

* @param allow_boolean whether to allow the boolean special case
*
* @returns the index_type or -1 on failure and fills the number of indices.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should adjust that slightly and leave it on prepare_index_noarray? It's obvious to look for docs there if you look at prepare_index, but not vice-versa?
(but just nitpicking/suggestion.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Fix flatiter indexing
4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy