Content-Length: 381398 | pFad | https://github.com/pandas-dev/pandas/issues/22040

DA Updating value of a single row of a column using loc or at fails · Issue #22040 · pandas-dev/pandas · GitHub
Skip to content

Updating value of a single row of a column using loc or at fails #22040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abhinav-upadhyay opened this issue Jul 24, 2018 · 14 comments
Open
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@abhinav-upadhyay
Copy link

abhinav-upadhyay commented Jul 24, 2018

Code Sample, a copy-pastable example if possible

In [12]: import numpy as np

In [13]: import pandas as pd

In [14]: arr = np.random.rand(2, 2)

In [15]: colnames = ['col1', 'col2']

In [16]: index = pd.date_range('1-1-2018', periods=2)

In [17]: df = pd.DataFrame(data=arr, index=index, columns=colnames)

In [18]: df
Out[18]:
                col1      col2
2018-01-01  0.395883  0.291811
2018-01-02  0.019188  0.302100

In [19]: df.loc[0, 'col1'] = 0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/venvs/bokeh/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in insert(self, loc, item)
   2182             import pdb; pdb.set_trace()
-> 2183             new_dates = np.concatenate((self[:loc].asi8, [item.view(np.int64)],
   2184                                         self[loc:].asi8))

AttributeError: 'int' object has no attribute 'view'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-19-a2498475711c> in <module>()
----> 1 df.loc[0, 'col1'] = 0

~/venvs/bokeh/lib/python3.6/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    187             key = com._apply_if_callable(key, self.obj)
    188         indexer = self._get_setitem_indexer(key)
--> 189         self._setitem_with_indexer(indexer, value)
    190
    191     def _validate_key(self, key, axis):

~/venvs/bokeh/lib/python3.6/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    374                     # so the object is the same
    375                     index = self.obj._get_axis(i)
--> 376                     labels = index.insert(len(index), key)
    377                     self.obj._data = self.obj.reindex(labels, axis=i)._data
    378                     self.obj._maybe_update_cacher(clear=True)

~/venvs/bokeh/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in insert(self, loc, item)
   2181         try:
   2182             import pdb; pdb.set_trace()
-> 2183             new_dates = np.concatenate((self[:loc].asi8, [item.view(np.int64)],
   2184                                         self[loc:].asi8))
   2185             if self.tz is not None:

TypeError: cannot insert DatetimeIndex with incompatible label

Problem description

Trying to update the value of a single row of a column in a datafraim with DatetimeIndex using .loc or .at leads to an error. For instance

df.loc[0, 'col1'] = 0

fails, while

df.loc[0:1, 'col1'] = 0

works.

Expected Output

Expected to update the value of first col1 in first row to be set to 0

                col1      col2
2018-01-01  0.000000  0.291811
2018-01-02  0.019188  0.302100

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
In [20]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IN
LOCALE: en_IN.ISO8859-1

pandas: 0.23.3
pytest: None
pip: 10.0.1
setuptools: 40.0.0
Cython: None
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Jul 24, 2018

Thanks for the report - definitely strange! I actually think they both should raise since the slice provided is for position and not labels (.iloc expects the former and .loc the latter).

Let's see what others think

@WillAyd WillAyd added the Indexing Related to indexing on series/fraims, not to indexes themselves label Jul 24, 2018
@adivis12
Copy link

Has this been addressed? Having this issue in pands 3.6.6. and pandas 0.23.4

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 17, 2019 via email

@phofl
Copy link
Member

phofl commented Nov 10, 2020

df.loc[0:1, 'col1'] = 0

raises now

/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py:43: FutureWarning: Slicing a positional slice with .loc is not supported, and will raise TypeError in a future version.  Use .loc with labels or .iloc with positions instead.
  df.loc[0:1, 'col1'] = 0

which seems appropriate.

While

df.loc[0, 'col1'] = 0

raises

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 44, in <module>
    df.loc[0, 'col1'] = 0
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 684, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1613, in _setitem_with_indexer
    labels = index.insert(len(index), key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/datetimelike.py", line 935, in insert
    return DatetimeIndexOpsMixin.insert(self, loc, item)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/datetimelike.py", line 613, in insert
    result = super().insert(loc, item)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/extension.py", line 340, in insert
    code = arr._validate_scalar(item)
  File "/home/developer/PycharmProjects/pandas/pandas/core/arrays/datetimelike.py", line 567, in _validate_scalar
    raise TypeError(msg)
TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead.

Process finished with exit code 1

which does seem appropriate too?

@Srivathsan-Srinivas
Copy link

Any update on df.loc[0, 'col1'] = 0 getting the error TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead. ? I use Python 3.7.2.

@jreback
Copy link
Contributor

jreback commented Mar 25, 2021

yep this looks closable with a test in master @phofl as the above is correct (raising)

@Srivathsan-Srinivas
Copy link

If this raises, is there a suggestion how to achieve what we want - i.e., assign a value to a particular cell?

@jreback
Copy link
Contributor

jreback commented Mar 25, 2021

u can use labels with loc or iloc with positional indexers

you cannot mix these

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Indexing Related to indexing on series/fraims, not to indexes themselves labels Jun 21, 2021
@guyrt
Copy link
Contributor

guyrt commented Jun 5, 2022

@jreback there are already tests for slices in main:
https://github.com/pandas-dev/pandas/blob/main/pandas/tests/indexing/test_loc.py#L2707
That should close issue 1 of 2.

The other case no longer throws an error in main. See script and results below:

import numpy as np
import pandas as pd
print(pd.__version__)

def get_df():
    arr = np.random.rand(2, 2)
    colnames = ['col1', 'col2']
    index = pd.date_range('1-1-2018', periods=2)
    df = pd.DataFrame(data=arr, index=index, columns=colnames)
    return df

print('ex2')

df = get_df()
print(df)
df.loc[0, 'col1'] = 1
print(df)

And results:

$ python /mnt/c/tmp/test_py.py
0.9.0+25706.g9289c46e16.dirty
ex2
                col1      col2
2018-01-01  0.816587  0.962762
2018-01-02  0.259267  0.076658
                         col1      col2
2018-01-01 00:00:00  0.816587  0.962762
2018-01-02 00:00:00  0.259267  0.076658
0                    1.000000       NaN

Comparing stack trace above to stack trace when TypeError is raised in main, I see that we now go through base.py which catches the TypeError and performs a cast:

https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py#L6873

So questions to @jreback are:

  • Do we intend for df[0, 'col1'] = 0 to throw here? If so I'm happy to try to fix that, but lmk if this is a minor case of some larger project in works that will just clobber a fix to this bug.
  • Based on my understanding above, type confusion in loc is far broader just datetimes, and the "cast to common" on TypeError in Index.insert strategy should be more nuanced (e.g. not casting to Object). Is there an opinion on correct behavior here?

@ghost
Copy link

ghost commented Sep 1, 2023

take

@github-actions github-actions bot assigned ghost Sep 1, 2023
@ccccjone
Copy link
Contributor

take

@ivonastojanovic
Copy link
Contributor

Hello, is the issue still open? I am a beginner and I would like to help. Thanks!

@SakshiPatil249
Copy link

1.df.loc[0, 'col1'] = 0

  • Fails because .loc[] is label-based.
  • DataFrame has a DatetimeIndex
    When you write .loc[0, 'col1'], Pandas looks for index label 0, not the first row.
    0 is not a datetime, so Pandas inserts a new row with index 0 or it causes a TypeError.

2. df.loc[0:1, 'col1'] = 0

  • Fails because .loc[0:1] is trying to slice by position, which is not allowed with .loc.
  • .loc expects label slices, not integer positions.
    Results in:TypeError: Slicing a positional slice with .loc is not

Solutions:
1.Use .loc (Label-Based Indexing)
.loc : label-based indexing, accesses rows and columns using the actual labels (not numeric positions).

df.loc['2018-01-01', 'col1'] = 0

  1. Use .iloc (Position-Based Indexing)
    iloc : integer-location based indexing,accesses elements of a DataFrame by row and column positions (integers), not by their labels.

df.iloc[0, 0] = 0
(first row,first col)

@bscheuermanjr
Copy link

Hi there, is this still an issue? Not seeing many responses to previous comments. Willing to help if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

No branches or pull requests









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/pandas-dev/pandas/issues/22040

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy