Skip to content

BUG: scalars missing several methods for array api compat #27305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Illviljan opened this issue Aug 28, 2024 · 17 comments
Open

BUG: scalars missing several methods for array api compat #27305

Illviljan opened this issue Aug 28, 2024 · 17 comments
Labels
00 - Bug 40 - array API standard PRs and issues related to support for the array API standard
Milestone

Comments

@Illviljan
Copy link
Contributor

Illviljan commented Aug 28, 2024

Describe the issue:

I keep getting stuck trying to get tests in https://github.com/data-apis/array-api-tests to pass and the final errors are often due to numpy arrays having turned into scalars and the tests not expecting that, (https://data-apis.org/array-api/draft/API_specification/array_object.html)

Reading #26850 it seems the intention is that the scalars should support the array api spec?

Reproduce the code example:

import numpy as np
import array_api_strict as xps


# Some examples:
xps.mean(xps.asarray(4, dtype=xps.float32)).__iadd__(1)
np.mean(np.asarray(4, dtype=np.float32)).__iadd__(1) # AttributeError: 'numpy.float32' object has no attribute '__iadd__'

xps.all(xps.asarray(True, dtype=xps.bool)).__ior__(False)
np.all(np.asarray(True, dtype=np.bool)).__ior__(False)  # AttributeError: 'numpy.bool' object has no attribute '__ior__'

xps.mean(xps.asarray(4, dtype=xps.float32)).__complex__()
np.mean(np.asarray(4, dtype=np.float32)).__complex__() # AttributeError: 'numpy.float32' object has no attribute '__complex__'

Error message:

No response

Python and NumPy Versions:

import sys, numpy; print(numpy.version); print(sys.version)
2.1.0
3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:42:31) [MSC v.1937 64 bit (AMD64)]

Runtime Environment:

No response

Context for the issue:

No response

@ngoldbaum
Copy link
Member

So one issue with the in-place operators is that scalars are immutable. Maybe we could implement them but make them return a copy? But maybe a in-place operation returning a copy is confusing?

Implementing __complex__ makes sense. I guess the reason it's missing is there isn't a PyNumberMethod slot for it as far as I can see: https://docs.python.org/3/c-api/typeobj.html. You'd probably need to "manually" define a function named __complex__.

@ngoldbaum ngoldbaum added 40 - array API standard PRs and issues related to support for the array API standard triage review Issue/PR to be discussed at the next triage meeting labels Aug 29, 2024
@ngoldbaum ngoldbaum added this to the 2.2.0 release milestone Sep 4, 2024
@ngoldbaum
Copy link
Member

ngoldbaum commented Sep 4, 2024

This is the current behavior:

In [6]: a = np.int64(3)

In [7]: id(a)
Out[7]: 2199088643840

In [8]: a += 3

In [9]: id(a)
Out[9]: 2199088641120

I think implementing __iadd__ and making it return a copy doesn't actually change this behavior. So we should do that.

I added a 2.2.0 milestone.

I think __iadd__ and __ior__ are probably the easiest since there are PyNumberMethod slots for them.

For __complex__ someone would need to define a python function (in C) named __complex__ that does the conversion to a PyComplex.

@ngoldbaum ngoldbaum removed the triage review Issue/PR to be discussed at the next triage meeting label Sep 4, 2024
@Illviljan
Copy link
Contributor Author

Setting up array-api-tests to run with scalars will probably be helpful as well.
It might smoke out more missing methods.

@Ishankoradia
Copy link
Contributor

@ngoldbaum have we decided to do this ?
I wanted to give it a shot.

@ngoldbaum
Copy link
Member

@Ishankoradia go ahead. Having a milestone means we're planning to do it. We also don't claim issues, just go ahead and work on it.

@Ishankoradia
Copy link
Contributor

Gotcha !!

@ngoldbaum I am looking at the core/__init__.pyi, i can see that for method __iadd__ we only accept NDArray. I am guessing my first step would be to add a method overload to accept scalar. Am i in the right direction ?

@ngoldbaum
Copy link
Member

ngoldbaum commented Sep 30, 2024

No, in order to implement these functions you'll need to modify the C internals of NumPy.

Here is where the PyNumberMethods struct is set up for all of the NumPy scalar types:

static PyNumberMethods @name@_as_number = {
.nb_add = (binaryfunc)@name@_add,
.nb_subtract = (binaryfunc)@name@_subtract,
.nb_multiply = (binaryfunc)@name@_multiply,
.nb_remainder = (binaryfunc)@name@_remainder,
.nb_divmod = (binaryfunc)@name@_divmod,
.nb_power = (ternaryfunc)@name@_power,
.nb_negative = (unaryfunc)@name@_negative,
.nb_positive = (unaryfunc)@name@_positive,
.nb_absolute = (unaryfunc)@name@_absolute,
.nb_bool = (inquiry)@name@_bool,
.nb_invert = (unaryfunc)@name@_invert,
.nb_lshift = (binaryfunc)@name@_lshift,
.nb_rshift = (binaryfunc)@name@_rshift,
.nb_and = (binaryfunc)@name@_and,
.nb_xor = (binaryfunc)@name@_xor,
.nb_or = (binaryfunc)@name@_or,
.nb_int = (unaryfunc)@name@_int,
.nb_float = (unaryfunc)@name@_float,
.nb_floor_divide = (binaryfunc)@name@_floor_divide,
.nb_true_divide = (binaryfunc)@name@_true_divide,
/* TODO: This struct/initialization should not be split between files */
.nb_index = (unaryfunc)NULL, /* set in add_scalarmath below */
};

This is inside a file that is written in NumPy's custom templating language used for codegen internally.

You can see that none of the inplace methods listed in the CPython docs are implemented. I think we need to implement all of them, not just __iadd__ and __ior__. You'd also need to add tests and I'd also double check my analysis above that defining in-place operators that return copies is actually OK. Do other array libraries do that?

For __complex__, as noted above, you'd need to add an entry and implementation for __complex__ to the PyMethodDef array for the scalar types:

static PyMethodDef gentype_methods[] = {
.

If you've never touched the CPython C API before this is probably a big project, although tbh it would be a decent way to learn about the C internals of NumPy or how to work with Python C extensions using the C API directly.

@Ishankoradia
Copy link
Contributor

Thanks a ton @ngoldbaum , this helps a lot.
I think I can do it. Although i have limited knowledge of Python C extensions , but like you said what better to learn this & numpy C internals.

@ngoldbaum
Copy link
Member

I found going through https://docs.python.org/3/extending/extending.html and https://llllllllll.github.io/c-extension-tutorial/ helped immensely to understand this stuff better.

@Ishankoradia
Copy link
Contributor

Ishankoradia commented Sep 30, 2024

Got it !! I will read through them before I dive in.

Thank you for sharing them.

@Ishankoradia
Copy link
Contributor

Ishankoradia commented Oct 4, 2024

@ngoldbaum I have spent a lot of time reading the material you shared. I have good understanding of how c extensions work. I built out 2(easy ones) and was able to run them from python

I was looking at this template file you pointed out scalarmath.c.src. Its very interesting. I think all @name@ placeholders are replaced by the correct dtypes. And i also see their corresponding method implementation in the compiled file scalarmath.c. But i can figure out where is the source code for those functions (eg. @name@_add ) coming from. Could you point me to that file ?

[updated]
Ohh is this the function @name@_ctype_add for .nb_add = (binaryfunc)@name@_add, ?

@ngoldbaum
Copy link
Member

Hi, sorry for taking a few days to respond, I've been on vacation.

The @name@ part of the template system numpy uses for codegen. All files with .c.src extensions use this template system.

The templating for the block of code I linked to is set up immediately above that code:

/**begin repeat
* #name = byte, ubyte, short, ushort, int, uint,
* long, ulong, longlong, ulonglong,
* half, float, double, longdouble,
* cfloat, cdouble, clongdouble#
**/

Here, the template system is saying to replace @name@ with each of the names in the comma-separated list, one for each scalar type.

In order to implement __iadd__, __ior__, and the rest of the in-place operators, you're going to need to define new templates that define implementations for the these operators (or maybe you can just re-use the existing implementations for the non-in-place operators? not sure) then add new entries to the table I linked to above for all the in-place operators like nb_inplace_add.

@Ishankoradia
Copy link
Contributor

@ngoldbaum no problem. Thanks for getting back. (Also, hope you had a great vacation and got good time to recharge)

So my first attempt was to use the same table. I tried to add a new entry there for .nb_inplace_add like this
Screenshot 2024-10-07 at 21 53 36

But the mistake I did was, I implemented a method @name@_ctype_inplace_add instead @name@_inplace_add. I assumed ctype is also some kind of a prefix stub that is needed for the methods because i see it everywhere.

I have added this now and i see the compiled file scalarmath.c that has this dummy implementation.
Screenshot 2024-10-07 at 22 11 48

  1. What is that ctype prefix ?
  2. Right now I have just copied the dummy implementation from _ctype_add. I guess i will have to break it down once i start implementing for each data types right based on how the logic looks like ?
  3. How are inputs to these methods handled here ? Can i assume in inplace operation that the first input will be self ?

If you can clear these up , it would be great. Sorry for the back & forth. Thank you for all the help in this one.

@ngoldbaum
Copy link
Member

ngoldbaum commented Oct 14, 2024

What is that ctype prefix ?

Right now I have just copied the dummy implementation from _ctype_add. I guess i will have to break it down once i start implementing for each data types right based on how the logic looks like ?

The ctype functions are used to define the "c-level" version of the operation. There's also a "python-level" version of the operation that is defined using a second level of templating to define a function named e.g. float_add. See the template function defined starting at line 1175 in scalarmath.c.src. I guess to get this to work, you'll also need to extend this to generate in-place python wrappers, along with the C-level wrappers. Although that said, maybe you can just re-use the existing wrappers that are already defined, so just set nb_inplace_add to e.g. (binaryfunc)@name@_add.

If you look up at my analysis when I originally opened the issue - numpy scalars already let you use the in-place operators, they just return a copy, so maybe in principle you can just re-use the existing implementations that are getting used already, but now by explicitly using the slot so people can call the dunder methods directly from Python.

How are inputs to these methods handled here ? Can i assume in inplace operation that the first input will be self ?

I think that's right, although instead of asking me and then waiting for a response, you should try experimenting to see for yourself. I would need to poke around with a C debugger to figure that out.

There are spin gdb and spin lldb commands. You'll also need to make sure you're building NumPy with debug symbols. There's no need to use a debug build of CPython unless you need to step through CPython, which is not needed for most things.

If you don't like debuggers, printf debugging also works :)

@Ishankoradia
Copy link
Contributor

What is that ctype prefix ?

Right now I have just copied the dummy implementation from _ctype_add. I guess i will have to break it down once i start implementing for each data types right based on how the logic looks like ?

The ctype functions are used to define the "c-level" version of the operation. There's also a "python-level" version of the operation that is defined using a second level of templating to define a function named e.g. float_add. See the template function defined starting at line 1175 in scalarmath.c.src. I guess to get this to work, you'll also need to extend this to generate in-place python wrappers, along with the C-level wrappers. Although that said, maybe you can just re-use the existing wrappers that are already defined, so just set nb_inplace_add to e.g. (binaryfunc)@name@_add.

If you look up at my analysis when I originally opened the issue - numpy scalars already let you use the in-place operators, they just return a copy, so maybe in principle you can just re-use the existing implementations that are getting used already, but now by explicitly using the slot so people can call the dunder methods directly from Python.

How are inputs to these methods handled here ? Can i assume in inplace operation that the first input will be self ?

I think that's right, although instead of asking me and then waiting for a response, you should try experimenting to see for yourself. I would need to poke around with a C debugger to figure that out.

There are spin gdb and spin lldb commands. You'll also need to make sure you're building NumPy with debug symbols. There's no need to use a debug build of CPython unless you need to step through CPython, which is not needed for most things.

If you don't like debuggers, printf debugging also works :)

Understood. Thank you Nathan !!
This helps a lot. I will get back in 3-4 days hopefully with something implemented.

@mhvk
Copy link
Contributor

mhvk commented Oct 15, 2024

Sorry for a late side comment, but reading this are we sure that this #27305 (comment) has been followed up:

You can see that none of the inplace methods listed in the CPython docs are implemented. I think we need to implement all of them, not just iadd and ior. You'd also need to add tests and I'd also double check my analysis above that defining in-place operators that return copies is actually OK. Do other array libraries do that?

Do other array libraries in fact do that for immutable scalars? I ask because certainly python ints do not implement __iadd__ - within python at least those methods should not be used directly, but tested through simply doing a += b (which goes through a.__add__(b) if there is no __iadd__).

@seberg
Copy link
Member

seberg commented Oct 15, 2024

That is correct, inplace operators cannot and thus must not be implemented. __complex__ makes a lot of sense to be missing, precisely because it is not an nb_ slot, but rather only a Python defined method.

@charris charris modified the milestones: 2.2.0 release, 2.3.0 release Nov 22, 2024
@charris charris modified the milestones: 2.3.0 release, 2.4.0 release May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 40 - array API standard PRs and issues related to support for the array API standard
Projects
None yet
Development

No branches or pull requests

6 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy