1. Overview
In this tutorial, we'll learn how to solve the popular warning message in Pandas:
/tmp/ipykernel_4904/714243365.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Several different reasons can cause this warning message. We are going to cover most of them and their solutions.
2. Setup
For this example we are going to use dummy DataFrame created by method: makeMixedDataFrame
:
from pandas.util.testing import makeMixedDataFrame
df = makeMixedDataFrame()
data:
A | B | C | D | |
---|---|---|---|---|
0 | 0.0 | 0.0 | foo1 | 2009-01-01 |
1 | 1.0 | 1.0 | foo2 | 2009-01-02 |
2 | 2.0 | 0.0 | foo3 | 2009-01-05 |
3 | 3.0 | 1.0 | foo4 | 2009-01-06 |
4 | 4.0 | 0.0 | foo5 | 2009-01-07 |
3. What are the reasons for SettingWithCopyWarning
3.1. Is Pandas DataFrame a Copy or a View?
Before jumping to solutions, let's try to answer the question in the title of this section. How to tell the difference between Copy or a View?
Let's cover this in few examples showing how to copy a DataFrame or get some part of it:
df_2 = df
df_3 = df.copy()
df_4 = df[:]
df_5 = df.loc[:, :]
df_6 = df.iloc[0:2, :]
df_7 = df['D']
Let's verify which of them are copies and which are views:
print(df._is_view, '|', hex(id(df)), '|', df._is_copy)
print(df_2._is_view, '|', hex(id(df_2)), '|', df_2._is_copy)
print(df_3._is_view, '|', hex(id(df_3)), '|', df_3._is_copy)
print(df_4._is_view, '|', hex(id(df_4)), '|', df_4._is_copy)
print(df_5._is_view, '|', hex(id(df_5)), '|', df_5._is_copy)
print(df_6._is_view, '|', hex(id(df_6)), '|', df_6._is_copy)
print(df_7._is_view, '|', hex(id(df_7)), '|', df_7._is_copy)
The output helps us to understand Copies and Views better:
_is_view | hex(id( | _is_copy |
---|---|---|
False | 0x7fde7ab88eb0 | None |
False | 0x7fde7ab88eb0 | None |
False | 0x7fde7abf24f0 | None |
False | 0x7fdec136f940 | <weakref at 0x7fde7ab8a680; to 'DataFrame' at 0x7fde7ab88eb0> |
False | 0x7fde7ab88eb0 | None |
False | 0x7fde7abf2730 | <weakref at 0x7fde7ab8a680; to 'DataFrame' at 0x7fde7ab88eb0> |
True | 0x7fde7abf2340 | None |
So we can see that: df['D']
will return a copy. df.iloc[0:2, :]
and df[:]
returns views.
We can also see the addresses of all DataFrames.
One more way to check the values of the DataFrame is by attribute:
df_2.values.base
which will show difference in case of different values:
array([[0.0, 1.0, 2.0, 3.0, 4.0],
[0.0, 1.0, 0.0, 1.0, 0.0],
['foo1', 'foo2', 'foo3', 'foo4', 'foo5'],
[Timestamp('2009-01-01 00:00:00'),
Timestamp('2009-01-02 00:00:00'),
Timestamp('2009-01-05 00:00:00'),
Timestamp('2009-01-06 00:00:00'),
Timestamp('2009-01-07 00:00:00')]], dtype=object)
In some cases using df_2.values
might lead to controversial results.
3.2. How data is accessed
Depending on how DataFrame data is accessed - will result in showing a warning or not.
Let's say that we would like to update values in column C
. We can do this by:
df["C"][df["C"]=="foo3"] = "foo33"
but warning will be produced:
/tmp/ipykernel_9907/1845991504.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["C"][df["C"]=="foo3"] = "foo33"
To get and set the values without SettingWithCopyWarning
warning we need to use loc
:
df.loc[df["C"]=="foo3", "C"] = "foo333"
4. Fix SettingWithCopyWarning by method copy()
The first and simplest solution is to create a DataFrame copy and work with it. This can be done by method - copy()
.
4.1 a value is trying to be set on a copy of a slice from a dataframe
Let's do a short demo of this problem:
/tmp/ipykernel_4904/714243365.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
and the solution. Let say that we get part of the initial DataFrame by:
df_new = df[['D', 'B']]
Our goal is to work only with this subset of columns and create new column based on the existing ones:
df_new['E'] = df_new['B'] > 0
This will cause warning:
df_7['E'] = df_7['B'] > 0
df_7['E'] = df_7['B'] > 0
/tmp/ipykernel_9907/381168311.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_7['E'] = df_7['B'] > 0
This warning can be suppress by using method copy()
:
df_new = df[['D', 'B']].copy()
Using method - `copy()` is recommended to small and medium sized DataFrames. For big ones and production solutions will cause performance issues.
5. Fix SettingWithCopyWarning by method loc
In this section we will do a demo on the warning when we work with a single DataFrame. In this case the warning is caused by the way we access data.
For example if we like to change all values in column C
which are different from foo3
then we might use:
df["C"][df["C"]!="foo3"] = "foo"
This will raise the warning message:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
To perform the operation without SettingWithCopyWarning
- we need to use attribute loc
in this way:
df.loc[df["C"]!="foo3", "C"] = "foo"
the operation is completed without the warning: "try using .loc[row_indexer,col_indexer] = value instead"
6. Conclusion
In this article, we looked at the reasons and solutions for the SettingWithCopyWarning warning in Pandas.
We focused on solving the original cause of the "a value is trying to be set on a copy of a slice from a dataframe. try using .loc[row_indexer,col_indexer] = value instead" rather than suppressing the message itself.
Finally depending on your context you may get slightly different problem. For example working with multi-index which is explained here: Returning a view versus a copy