To remove any doubt, my solution was to make a deep copy of the slice instead of a regular copy. This may not be applicable depending on your context (Memory constraints / size of the slice, potential for performance degradation - especially if the copy occurs in a loop like it did for me, etc...)
To be clear, here is the warning I received:
/opt/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:54:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I had doubts that the warning was thrown because of a column I was dropping on a copy of the slice. While not technically trying to set a value in the copy of the slice, that was still a modification of the copy of the slice. Below are the (simplified) steps I have taken to confirm the suspicion, I hope it will help those of us who are trying to understand the warning.
We knew that already but this is a healthy reminder. This is NOT what the warning is about.
>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1
A B
0 111 121
1 112 122
2 113 123
>> df2 = df1
>> df2
A B
0 111 121
1 112 122
2 113 123
# Dropping a column on df1 affects df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
B
0 121
1 122
2 123
It is possible to avoid changes made on df1 to affect df2. Note: you can avoid importing copy.deepcopy
by doing df.copy()
instead.
>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1
A B
0 111 121
1 112 122
2 113 123
>> import copy
>> df2 = copy.deepcopy(df1)
>> df2
A B
0 111 121
1 112 122
2 113 123
# Dropping a column on df1 does not affect df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
A B
0 111 121
1 112 122
2 113 123
This actually illustrates the warning.
>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1
A B
0 111 121
1 112 122
2 113 123
>> df2 = df1
>> df2
A B
0 111 121
1 112 122
2 113 123
# Dropping a column on df2 can affect df1
# No slice involved here, but I believe the principle remains the same?
# Let me know if not
>> df2.drop('A', axis=1, inplace=True)
>> df1
B
0 121
1 122
2 123
It is possible to avoid changes made on df2 to affect df1
>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1
A B
0 111 121
1 112 122
2 113 123
>> import copy
>> df2 = copy.deepcopy(df1)
>> df2
A B
0 111 121
1 112 122
2 113 123
>> df2.drop('A', axis=1, inplace=True)
>> df1
A B
0 111 121
1 112 122
2 113 123
Cheers!