1. Overview
In this tutorial, we'll learn how to solve the popular warning in Pandas and Python - SettingWithCopyWarning:
/tmp/ipykernel_4904/714243365.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Several different reasons can cause this warning message. We are going to cover most of them and their solutions.
2. Setup
For this example we are going to use dummy DataFrame created by method: makeMixedDataFrame
:
from pandas.util.testing import makeMixedDataFrame
df = makeMixedDataFrame()
data:
A | B | C | D | |
---|---|---|---|---|
0 | 0.0 | 0.0 | foo1 | 2009-01-01 |
1 | 1.0 | 1.0 | foo2 | 2009-01-02 |
2 | 2.0 | 0.0 | foo3 | 2009-01-05 |
3 | 3.0 | 1.0 | foo4 | 2009-01-06 |
4 | 4.0 | 0.0 | foo5 | 2009-01-07 |
3. What are the reasons for SettingWithCopyWarning
3.1. Is Pandas DataFrame a Copy or a View?
Before jumping to solutions, let's try to answer the question in the title of this section. How to tell the difference between Copy or a View?
Let's cover this in few examples showing how to copy a DataFrame or get some part of it:
df_2 = df
df_3 = df.copy()
df_4 = df[:]
df_5 = df.loc[:, :]
df_6 = df.iloc[0:2, :]
df_7 = df['D']
Let's verify which of them are copies and which are views:
print(df._is_view, '|', hex(id(df)), '|', df._is_copy)
print(df_2._is_view, '|', hex(id(df_2)), '|', df_2._is_copy)
print(df_3._is_view, '|', hex(id(df_3)), '|', df_3._is_copy)
print(df_4._is_view, '|', hex(id(df_4)), '|', df_4._is_copy)
print(df_5._is_view, '|', hex(id(df_5)), '|', df_5._is_copy)
print(df_6._is_view, '|', hex(id(df_6)), '|', df_6._is_copy)
print(df_7._is_view, '|', hex(id(df_7)), '|', df_7._is_copy)
The output helps us to understand Copies and Views better:
_is_view | hex(id( | _is_copy |
---|---|---|
False | 0x7fde7ab88eb0 | None |
False | 0x7fde7ab88eb0 | None |
False | 0x7fde7abf24f0 | None |
False | 0x7fdec136f940 | <weakref at 0x7fde7ab8a680; to 'DataFrame' at 0x7fde7ab88eb0> |
False | 0x7fde7ab88eb0 | None |
False | 0x7fde7abf2730 | <weakref at 0x7fde7ab8a680; to 'DataFrame' at 0x7fde7ab88eb0> |
True | 0x7fde7abf2340 | None |
So we can see that: df['D']
will return a copy. df.iloc[0:2, :]
and df[:]
returns views.
We can also see the addresses of all DataFrames.
One more way to check the values of the DataFrame is by attribute:
df_2.values.base
which will show difference in case of different values:
array([[0.0, 1.0, 2.0, 3.0, 4.0],
[0.0, 1.0, 0.0, 1.0, 0.0],
['foo1', 'foo2', 'foo3', 'foo4', 'foo5'],
[Timestamp('2009-01-01 00:00:00'),
Timestamp('2009-01-02 00:00:00'),
Timestamp('2009-01-05 00:00:00'),
Timestamp('2009-01-06 00:00:00'),
Timestamp('2009-01-07 00:00:00')]], dtype=object)
In some cases using df_2.values
might lead to controversial results.
3.2. How data is accessed
Depending on how DataFrame data is accessed - will result in showing a warning or not.
Let's say that we would like to update values in column C
. We can do this by:
df["C"][df["C"]=="foo3"] = "foo33"
but warning will be produced:
/tmp/ipykernel_9907/1845991504.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["C"][df["C"]=="foo3"] = "foo33"
To get and set the values without SettingWithCopyWarning
warning we need to use loc
:
df.loc[df["C"]=="foo3", "C"] = "foo333"
4. Fix SettingWithCopyWarning by method copy()
The first and simplest solution is to create a DataFrame copy and work with it. This can be done by method - copy()
.
4.1 a value is trying to be set on a copy of a slice from a dataframe
Let's do a short demo of this problem:
/tmp/ipykernel_4904/714243365.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
and the solution. Let say that we get part of the initial DataFrame by:
df_new = df[['D', 'B']]
Our goal is to work only with this subset of columns and create new column based on the existing ones:
df_new['E'] = df_new['B'] > 0
This will cause warning:
df_7['E'] = df_7['B'] > 0
df_7['E'] = df_7['B'] > 0
/tmp/ipykernel_9907/381168311.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_7['E'] = df_7['B'] > 0
This warning can be suppress by using method copy()
:
df_new = df[['D', 'B']].copy()
Using method - `copy()` is recommended to small and medium sized DataFrames. For big ones and production solutions will cause performance issues.
5. Fix SettingWithCopyWarning by method loc
In this section we will do a demo on the warning when we work with a single DataFrame. In this case the warning is caused by the way we access data.
For example if we like to change all values in column C
which are different from foo3
then we might use:
df["C"][df["C"]!="foo3"] = "foo"
This will raise the warning message:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
To perform the operation without SettingWithCopyWarning
- we need to use attribute loc
in this way:
df.loc[df["C"]!="foo3", "C"] = "foo"
the operation is completed without the warning: "try using .loc[row_indexer,col_indexer] = value instead"
6. When .loc results into SettingWithCopyWarning
In some cases even if we are using recommendation of df.loc[:, 'm']
we might get the error like:
df.loc[:, 'm'] = df['date'].dt.to_period('M')
raise an error:
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.loc[:, 'm'] = df['date'].dt.to_period('M')
/tmp/ipykernel_12403/2549023945.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
To solve the error we need explicitly to add copy()
method:
df.loc[:, 'm'] = df['date'].copy().dt.to_period('M')
Finally error SettingWithCopyWarning is solved.
7. suppressing SettingWithCopyWarning warning
Sometimes you may get the error for SettingWithCopyWarning when the code is correct and valid. Second execution of the problematic cell might hide the wanrning.
To get rid completely of this warning SettingWithCopyWarning we can use method: filterwarnings('ignore')
import numpy as np
np.warnings.filterwarnings('ignore')
8. Conclusion
In this article, we looked at the reasons and solutions for the warning SettingWithCopyWarning in Pandas.
We focused on solving the original cause of the:
"a value is trying to be set on a copy of a slice from a dataframe. try using .loc[row_indexer,col_indexer] = value instead".
We also covered how to suppress the message itself. Update of Pandas library to the latest version might help to reduce the error.
Finally depending on your context you may get a slightly different problem. For example working with multi-index which is explained here: Returning a view versus a copy