Skip to content

ENH, API: New sorting mechanism for DType API #28516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
4700c13
ENH: Allocate lock only once in StringDType quicksort
MaanasArora Mar 14, 2025
5fb0b5f
ENH: Add dtype slots for sorting and begin integration
MaanasArora Mar 26, 2025
76be21b
MAINT: Rename sort compare slot access function
MaanasArora Mar 26, 2025
59590d2
ENH: Add dtype slot sorting functionality support to all sort kinds
MaanasArora Mar 26, 2025
b89accd
ENH: Add descending flag to internal sorting functions
MaanasArora Mar 26, 2025
a437eb9
MAINT: Improve get dtype sort compare function name
MaanasArora Mar 27, 2025
aa63d11
MAINT: Fix doc typo
MaanasArora Mar 27, 2025
16e95a2
MAINT: Error out when non-legacy dtype has no sort_compare function
MaanasArora Mar 27, 2025
42e76d6
DOC: Add release notes for new dtype sorting API
MaanasArora Mar 28, 2025
88636cc
DOC: Add doc for sort compare slot in release notes
MaanasArora Mar 28, 2025
9d14ec1
DOC: Add note for potential deprecation of sort arrfuncs in release note
MaanasArora Mar 30, 2025
a556455
MAINT: Reorder dtype slots to prevent changing existing slot numbers
MaanasArora Mar 30, 2025
3c0957e
BUG: Error on missing `sort_compare` slot only when dtype is privatel…
MaanasArora Mar 30, 2025
9506798
DOC: Add C-API documentation for new sorting slots
MaanasArora Apr 1, 2025
6ce5351
ENH: Replace array object with context and auxdata in sortfunc signat…
MaanasArora Apr 5, 2025
96a53b2
BUG: Fix unnecessarily private function call due to underscore typo
MaanasArora Apr 5, 2025
9a2b100
MAINT: Fix whitespace typos
MaanasArora Apr 5, 2025
8d4c75d
ENH: Allow flexible sorting compare for arr or descr in npy_sort func…
MaanasArora Apr 11, 2025
50988ba
ENH: Add new sort func implementations and use in stringdtype
MaanasArora Apr 11, 2025
ca5797e
DOC: Fix missing newline in ctype doc
MaanasArora Apr 11, 2025
95cfd8f
DOC: Add sortfunc typedef docs
MaanasArora Apr 12, 2025
6dd4f4c
DOC: Fix missing newline in ctype doc
MaanasArora Apr 12, 2025
4fa813c
ENH: Define SortCompareFunc type
MaanasArora Apr 12, 2025
894911e
Update dtype sorting signatures: move context, move out auxdata to ge…
MaanasArora May 13, 2025
57687ac
MAINT: Check error in Get(Arg)SortFunc using return value
MaanasArora May 13, 2025
0edb4ea
DOC: Add missing newlines to c-types in array.rst
MaanasArora May 14, 2025
167301e
MAINT: Rename new sort funcs and restore older names for existing pub…
MaanasArora May 24, 2025
e6b8c1e
MAINT: Rename start pointer in new sort func documentation to data
MaanasArora May 24, 2025
579c351
ENH: Add flags to new get_(arg)sort_function
MaanasArora May 28, 2025
d854b00
DOC: Mention new sort func buffers to be contiguous
MaanasArora May 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
MAINT: Rename new sort funcs and restore older names for existing pub…
…lic API
  • Loading branch information
MaanasArora committed May 24, 2025
commit 167301ebdc120d3b209b61659f43ce78ee4be5b6
12 changes: 6 additions & 6 deletions doc/source/reference/c-api/array.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1873,7 +1873,7 @@ described below.
pointer. Currently this is used for zero-filling and clearing arrays storing
embedded references.

.. c:type:: int (PyArray_SortFunc)( \
.. c:type:: int (PyArray_SortFuncWithContext)( \
PyArrayMethod_Context *data, void *start, \
npy_intp num, NpyAuxData *auxdata)

Expand All @@ -1883,7 +1883,7 @@ described below.
slots, where *context* is passed in containing the descriptor for the
array. Returns 0 on success, -1 on failure.

.. c:type:: int (PyArray_ArgSortFunc)( \
.. c:type:: int (PyArray_ArgSortFuncWithContext)( \
PyArrayMethod_Context *data, void *start, \
npy_intp *tosort, npy_intp num, NpyAuxData *auxdata)

Expand Down Expand Up @@ -3538,7 +3538,7 @@ member of ``PyArrayDTypeMeta_Spec`` struct.
.. c:macro:: NPY_DT_get_sort_function

.. c:type:: int *(PyArrayDTypeMeta_GetSortFunction)(PyArray_Descr *, \
npy_intp sort_kind, int descending, PyArray_SortFunc **out_sort, \
npy_intp sort_kind, int descending, PyArray_SortFuncWithContext **out_sort, \
NpyAuxData **out_auxdata)

If defined, sets a custom sorting function for the DType for each of
Expand All @@ -3547,7 +3547,7 @@ member of ``PyArrayDTypeMeta_Spec`` struct.
.. c:macro:: NPY_DT_get_argsort_function

.. c:type:: int *(PyArrayDTypeMeta_GetArgSortFunction)(PyArray_Descr *, \
npy_intp sort_kind, int descending, PyArray_ArgSortFunc **out_argsort, \
npy_intp sort_kind, int descending, PyArray_ArgSortFuncWithContext **out_argsort, \
NpyAuxData **out_auxdata)

If defined, sets a custom argsorting function for the DType for each of
Expand Down Expand Up @@ -3628,15 +3628,15 @@ DType API slots but for now we have exposed the legacy

.. c:macro:: NPY_DT_PyArray_ArrFuncs_sort

An array of PyArray_SortFunc of length ``NPY_NSORTS``. If set, allows
An array of PyArray_SortFuncWithContext of length ``NPY_NSORTS``. If set, allows
defining custom sorting implementations for each of the sorting
algorithms numpy implements. If `NPY_DT_get_sort_function` is
defined, it will be used instead. This slot may be deprecated in the
future.

.. c:macro:: NPY_DT_PyArray_ArrFuncs_argsort

An array of PyArray_ArgSortFunc of length ``NPY_NSORTS``. If set,
An array of PyArray_ArgSortFuncWithContext of length ``NPY_NSORTS``. If set,
allows defining custom argsorting implementations for each of the
sorting algorithms numpy implements. If `NPY_DT_get_argsort_function`
is defined, it will be used instead. This slot may be deprecated in
Expand Down
4 changes: 2 additions & 2 deletions doc/source/reference/c-api/types-and-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -494,8 +494,8 @@ PyArray_ArrFuncs
PyArray_NonzeroFunc *nonzero;
PyArray_FillFunc *fill;
PyArray_FillWithScalarFunc *fillwithscalar;
PyArray_SortFunc *sort[NPY_NSORTS];
PyArray_ArgSortFunc *argsort[NPY_NSORTS];
PyArray_SortFuncWithContext *sort[NPY_NSORTS];
PyArray_ArgSortFuncWithContext *argsort[NPY_NSORTS];
PyObject *castdict;
PyArray_ScalarKindFunc *scalarkind;
int **cancastscalarkindto;
Expand Down
8 changes: 4 additions & 4 deletions numpy/_core/include/numpy/dtype_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -485,16 +485,16 @@ typedef int (PyArray_CompareFuncWithDescr)(const void *, const void *,
PyArray_Descr *);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is a bit weird here, but I didn't want to disturb the original type as it's used a lot. I think the SortCompareFunc should still be a unique type so will do that (even if only a clone of this type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have slightly mixed feelings. On the one hand, I think this is the pragmatic thing to have.
On the other hand, we could also look this function from the np.less_than or np.great_than ufunc to implement sorting, I think.
(The problem there is still how to deal with unordered elements, a compare ufunc would work better...)

But, on the other hand, it seems pragmatic even if it won't work well e.g. for structured dtypes (performance issues), it will always work and provides an easy entry-point (we can also use this to define default comparison ufuncs).

So overall, I think I end up at just doing this, although I could imaging punting if we don't need it for StringDType (I suspect we do, though).

Would like to hear if @ngoldbaum has an opinion.

(A neater future path would also be if this was more of a header-only code binding generator job with us making the sorting patterns available maybe. I.e. if this was defined in a C++ class and our sort code available, the DType could compile the full loop and avoid calling such a helper everywhere.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is fine, if only because it exists right now 😄

typedef int (PyArray_SortCompareFunc)(const void *, const void *,
PyArray_Descr *);
typedef int (PyArray_SortFunc)(PyArrayMethod_Context *,
typedef int (PyArray_SortFuncWithContext)(PyArrayMethod_Context *,
void *, npy_intp,
NpyAuxData *);
typedef int (PyArray_ArgSortFunc)(PyArrayMethod_Context *,
typedef int (PyArray_ArgSortFuncWithContext)(PyArrayMethod_Context *,
void *, npy_intp *, npy_intp,
NpyAuxData *);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two need different names and you need to leave the original typedefs in ndarraytypes.h that had these names, since they're public API.

Copy link
Contributor Author

@MaanasArora MaanasArora May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing! This is done.


typedef int *(PyArrayDTypeMeta_GetSortFunction)(PyArray_Descr *,
npy_intp, int, PyArray_SortFunc **, NpyAuxData **);
npy_intp, int, PyArray_SortFuncWithContext **, NpyAuxData **);
typedef int *(PyArrayDTypeMeta_GetArgSortFunction)(PyArray_Descr *,
npy_intp, int, PyArray_ArgSortFunc **, NpyAuxData **);
npy_intp, int, PyArray_ArgSortFuncWithContext **, NpyAuxData **);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New stuff in the public API needs new API docs as well as a release note describing the new features.

Maybe also as a proof-of-concept, it looks like both quaddtype and mpfdtype in numpy-user-dtypes implement sorting - would you be willing to update them to use the new API in a PR to numpy-user-dtypes that depends on this PR to numpy? That should give you a feeling for whether this API is helpful for someone writing a new user dtype. It'll also be a form of documentation - we don't have great docs for writing user dtypes besides the examples in numpy-user-dtypes.

Copy link
Member

@ngoldbaum ngoldbaum Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what should we do about the flags that got added before we made the dtype API public, e.g. NPY_DT_PyArray_ArrFuncs_compare? I guess we can deprecate them although I don't know how hard it would be to generate deprecation warnings if those are used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to generate a deprecation warning during registration (a bit tedious maybe, as you need explicit check).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add API docs and a release note, and willing to make a PR to numpy-user-dtypes! Will look into that.

Just to be clear, NPY_DT_PyArray_ArrFuncs_compare is still needed right? We can move it to a new slot rather than an arrayfunc but it's going to be different from the sort comparison for now if I'm thinking right (as it is user-facing rather than used in the sorting). Do we need to do this another way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't change slot numbers (unless they are guarded as private)! So the numbers are fixed (until they have not been used for a bit at least).
So yeah, I think we should keep it the old slot for now, maybe easier to make the deprecation a follow up.[^depr]

So, we just have to live with the numbering we got, I half thought I asked for an offset for the NPY_DT_PyArray_ArrFuncs slots, but maybe I didn't bother.
(It's not a big issue, the only thing is the convenience if slot numbers == slot offset so you don't need to translate it.)

[^depr] I think this is as simple as asking users to compile with the new NumPy, and then adding PyArray_RUNTIME_VERSION, but this PR is complicated enough due to API decisions for the new loops, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we just have to live with the numbering we got, I half thought I asked for an offset for the NPY_DT_PyArray_ArrFuncs slots, but maybe I didn't bother.

There is an offset, _NPY_DT_ARRFUNCS_OFFSET:

#define NPY_DT_MAX_ARRFUNCS_SLOT \
NPY_NUM_DTYPE_PYARRAY_ARRFUNCS_SLOTS + _NPY_DT_ARRFUNCS_OFFSET

#endif /* NUMPY_CORE_INCLUDE_NUMPY___DTYPE_API_H_ */
8 changes: 4 additions & 4 deletions numpy/_core/include/numpy/ndarraytypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -422,8 +422,8 @@ typedef int (PyArray_FromStrFunc)(char *s, void *dptr, char **endptr,

typedef int (PyArray_FillFunc)(void *, npy_intp, void *);

typedef int (PyArray_SortFuncWithArray)(void *, npy_intp, void *);
typedef int (PyArray_ArgSortFuncWithArray)(void *, npy_intp *, npy_intp, void *);
typedef int (PyArray_SortFunc)(void *, npy_intp, void *);
typedef int (PyArray_ArgSortFunc)(void *, npy_intp *, npy_intp, void *);

typedef int (PyArray_FillWithScalarFunc)(void *, npy_intp, void *, void *);

Expand Down Expand Up @@ -514,8 +514,8 @@ typedef struct {
* Sorting functions
* Can be NULL
*/
PyArray_SortFuncWithArray *sort[NPY_NSORTS];
PyArray_ArgSortFuncWithArray *argsort[NPY_NSORTS];
PyArray_SortFunc *sort[NPY_NSORTS];
PyArray_ArgSortFunc *argsort[NPY_NSORTS];

/*
* Dictionary of additional casting functions
Expand Down
4 changes: 2 additions & 2 deletions numpy/_core/src/multiarray/dtypemeta.h
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ PyArray_SETITEM(PyArrayObject *arr, char *itemptr, PyObject *v)

static inline int
PyArray_GetSortFunction(PyArray_Descr *descr,
NPY_SORTKIND which, int descending, PyArray_SortFunc **out_sort,
NPY_SORTKIND which, int descending, PyArray_SortFuncWithContext **out_sort,
NpyAuxData **out_auxdata)
{
if (NPY_DT_SLOTS(NPY_DTYPE(descr))->get_sort_function == NULL) {
Expand All @@ -314,7 +314,7 @@ PyArray_GetSortFunction(PyArray_Descr *descr,

static inline int
PyArray_GetArgSortFunction(PyArray_Descr *descr,
NPY_SORTKIND which, int descending, PyArray_ArgSortFunc **out_argsort,
NPY_SORTKIND which, int descending, PyArray_ArgSortFuncWithContext **out_argsort,
NpyAuxData **out_auxdata)
{
if (NPY_DT_SLOTS(NPY_DTYPE(descr))->get_argsort_function == NULL) {
Expand Down
22 changes: 11 additions & 11 deletions numpy/_core/src/multiarray/item_selection.c
Original file line number Diff line number Diff line change
Expand Up @@ -1191,8 +1191,8 @@ PyArray_Choose(PyArrayObject *ip, PyObject *op, PyArrayObject *out,
* over all but the desired sorting axis.
*/
static int
_new_sortlike(PyArrayObject *op, int axis, PyArray_SortFunc *sort,
PyArray_SortFuncWithArray *sort_with_array, NpyAuxData *auxdata,
_new_sortlike(PyArrayObject *op, int axis, PyArray_SortFuncWithContext *sort,
PyArray_SortFunc *sort_with_array, NpyAuxData *auxdata,
PyArray_PartitionFunc *part, npy_intp const *kth, npy_intp nkth)
{
npy_intp N = PyArray_DIM(op, axis);
Expand Down Expand Up @@ -1368,8 +1368,8 @@ _new_sortlike(PyArrayObject *op, int axis, PyArray_SortFunc *sort,
}

static PyObject*
_new_argsortlike(PyArrayObject *op, int axis, PyArray_ArgSortFunc *argsort,
PyArray_ArgSortFuncWithArray *argsort_with_array,
_new_argsortlike(PyArrayObject *op, int axis, PyArray_ArgSortFuncWithContext *argsort,
PyArray_ArgSortFunc *argsort_with_array,
NpyAuxData *auxdata, PyArray_ArgPartitionFunc *argpart,
npy_intp const *kth, npy_intp nkth)
{
Expand Down Expand Up @@ -1574,8 +1574,8 @@ _new_argsortlike(PyArrayObject *op, int axis, PyArray_ArgSortFunc *argsort,
NPY_NO_EXPORT int
PyArray_Sort(PyArrayObject *op, int axis, NPY_SORTKIND which)
{
PyArray_SortFunc *sort = NULL;
PyArray_SortFuncWithArray *sort_with_array = NULL;
PyArray_SortFuncWithContext *sort = NULL;
PyArray_SortFunc *sort_with_array = NULL;

NpyAuxData *auxdata = NULL;

Expand Down Expand Up @@ -1710,7 +1710,7 @@ PyArray_Partition(PyArrayObject *op, PyArrayObject * ktharray, int axis,
{
PyArrayObject *kthrvl;
PyArray_PartitionFunc *part;
PyArray_SortFuncWithArray *sort;
PyArray_SortFunc *sort;
int n = PyArray_NDIM(op);
int ret;

Expand Down Expand Up @@ -1761,8 +1761,8 @@ NPY_NO_EXPORT PyObject *
PyArray_ArgSort(PyArrayObject *op, int axis, NPY_SORTKIND which)
{
PyArrayObject *op2;
PyArray_ArgSortFunc *argsort = NULL;
PyArray_ArgSortFuncWithArray *argsort_with_array = NULL;
PyArray_ArgSortFuncWithContext *argsort = NULL;
PyArray_ArgSortFunc *argsort_with_array = NULL;
PyObject *ret;

NpyAuxData *auxdata = NULL;
Expand Down Expand Up @@ -1831,7 +1831,7 @@ PyArray_ArgPartition(PyArrayObject *op, PyArrayObject *ktharray, int axis,
{
PyArrayObject *op2, *kthrvl;
PyArray_ArgPartitionFunc *argpart;
PyArray_ArgSortFuncWithArray *argsort;
PyArray_ArgSortFunc *argsort;
PyObject *ret;

/*
Expand Down Expand Up @@ -1901,7 +1901,7 @@ PyArray_LexSort(PyObject *sort_keys, int axis)
int elsize;
int maxelsize;
int object = 0;
PyArray_ArgSortFuncWithArray *argsort;
PyArray_ArgSortFunc *argsort;
NPY_BEGIN_THREADS_DEF;

if (!PySequence_Check(sort_keys)
Expand Down
8 changes: 4 additions & 4 deletions numpy/_core/src/multiarray/stringdtype/dtype.c
Original file line number Diff line number Diff line change
Expand Up @@ -525,7 +525,7 @@ stringdtype_sort_compare(void *a, void *b, PyArray_Descr *descr) {

int
_stringdtype_sort(PyArrayMethod_Context *context, void *start, npy_intp num,
NpyAuxData *auxdata, PyArray_SortFunc *sort) {
NpyAuxData *auxdata, PyArray_SortFuncWithContext *sort) {
PyArray_StringDTypeObject *descr = (PyArray_StringDTypeObject *)context->descriptors[0];

NpyString_acquire_allocator(descr);
Expand Down Expand Up @@ -558,7 +558,7 @@ _stringdtype_timsort(PyArrayMethod_Context *context, void *start, npy_intp num,

int
stringdtype_get_sort_function(PyArray_Descr *descr,
NPY_SORTKIND sort_kind, int descending, PyArray_SortFunc **out_sort,
NPY_SORTKIND sort_kind, int descending, PyArray_SortFuncWithContext **out_sort,
NpyAuxData **NPY_UNUSED(out_auxdata)) {

switch (sort_kind) {
Expand All @@ -579,7 +579,7 @@ stringdtype_get_sort_function(PyArray_Descr *descr,

int
_stringdtype_argsort(PyArrayMethod_Context *context, void *vv, npy_intp *tosort,
npy_intp num, NpyAuxData *auxdata, PyArray_ArgSortFunc *argsort) {
npy_intp num, NpyAuxData *auxdata, PyArray_ArgSortFuncWithContext *argsort) {
PyArray_StringDTypeObject *descr = (PyArray_StringDTypeObject *)context->descriptors[0];

NpyString_acquire_allocator(descr);
Expand Down Expand Up @@ -612,7 +612,7 @@ _stringdtype_atimsort(PyArrayMethod_Context *context, void *vv, npy_intp *tosort

int
stringdtype_get_argsort_function(PyArray_Descr *descr,
NPY_SORTKIND sort_kind, int descending, PyArray_ArgSortFunc **out_argsort) {
NPY_SORTKIND sort_kind, int descending, PyArray_ArgSortFuncWithContext **out_argsort) {

switch (sort_kind) {
default:
Expand Down
4 changes: 2 additions & 2 deletions numpy/_core/src/npysort/npysort_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ extern "C" {

static inline int
handle_npysort_with_context(PyArrayMethod_Context *context, void *start, npy_intp num,
NpyAuxData *auxdata, PyArray_SortFuncWithArray *sort)
NpyAuxData *auxdata, PyArray_SortFunc *sort)
{
PyArray_Descr *descr = context->descriptors[0];
return sort(start, num, descr);
}

static inline int
handle_npyasort_with_context(PyArrayMethod_Context *context, void *vv, npy_intp *tosort,
npy_intp num, NpyAuxData *auxdata, PyArray_ArgSortFuncWithArray *asort)
npy_intp num, NpyAuxData *auxdata, PyArray_ArgSortFunc *asort)
{
PyArray_Descr *descr = context->descriptors[0];
return asort(vv, tosort, num, descr);
Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy