The primary purpose is to avoid chained indexing and eliminate the SettingWithCopyWarning
.
Here chained indexing is something like dfc['A'][0] = 111
The document said chained indexing should be avoided in Returning a view versus a copy. Here is a slightly modified example from that document:
In [1]: import pandas as pd
In [2]: dfc = pd.DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
In [3]: dfc
Out[3]:
A B
0 aaa 1
1 bbb 2
2 ccc 3
In [4]: aColumn = dfc['A']
In [5]: aColumn[0] = 111
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
In [6]: dfc
Out[6]:
A B
0 111 1
1 bbb 2
2 ccc 3
Here the aColumn
is a view and not a copy from the original DataFrame, so modifying aColumn
will cause the original dfc
be modified too. Next, if we index the row first:
In [7]: zero_row = dfc.loc[0]
In [8]: zero_row['A'] = 222
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
In [9]: dfc
Out[9]:
A B
0 111 1
1 bbb 2
2 ccc 3
This time zero_row
is a copy, so the original dfc
is not modified.
From these two examples above, we see it's ambiguous whether or not you want to change the original DataFrame. This is especially dangerous if you write something like the following:
In [10]: dfc.loc[0]['A'] = 333
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
In [11]: dfc
Out[11]:
A B
0 111 1
1 bbb 2
2 ccc 3
This time it didn't work at all. Here we wanted to change dfc
, but we actually modified an intermediate value dfc.loc[0]
that is a copy and is discarded immediately. It’s very hard to predict whether the intermediate value like dfc.loc[0]
or dfc['A']
is a view or a copy, so it's not guaranteed whether or not original DataFrame will be updated. That's why chained indexing should be avoided, and pandas generates the SettingWithCopyWarning
for this kind of chained indexing update.
Now is the use of .copy()
. To eliminate the warning, make a copy to express your intention explicitly:
In [12]: zero_row_copy = dfc.loc[0].copy()
In [13]: zero_row_copy['A'] = 444 # This time no warning
Since you are modifying a copy, you know the original dfc
will never change and you are not expecting it to change. Your expectation matches the behavior, then the SettingWithCopyWarning
disappears.
Note, If you do want to modify the original DataFrame, the document suggests you use loc
:
In [14]: dfc.loc[0,'A'] = 555
In [15]: dfc
Out[15]:
A B
0 555 1
1 bbb 2
2 ccc 3