This is a very basic question, I just can not seem to find an answer.
I have a dataframe like this, called df:
A B C
a.1 b.1 c.1
a.2 b.2 c.2
a.3 b.3 c.3
Then I extract all the rows from df, where column 'B' has a value of 'b.2'. I assign these results to df_2.
df_2 = df[df['B'] == 'b.2']
df_2 becomes:
A B C
a.2 b.2 c.2
Then, I copy all the values in column 'B' to a new column named 'D'. Causing df_2 to become:
A B C D
a.2 b.2 c.2 b.2
When I preform an assignment like this:
df_2['D'] = df_2['B']
I get the following warning:
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I have also tried using .loc when creating df_2 like this:
df_2 = df.loc[df['B'] == 'b.2']
However, I still get the warning.
Any help is greatly appreciated.
You can simply assign the B
to the new column , Like -
df['D'] = df['B']
Example/Demo -
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([['a.1','b.1','c.1'],['a.2','b.2','c.2'],['a.3','b.3','c.3']],columns=['A','B','C'])
In [3]: df
Out[3]:
A B C
0 a.1 b.1 c.1
1 a.2 b.2 c.2
2 a.3 b.3 c.3
In [4]: df['D'] = df['B'] #<---What you want.
In [5]: df
Out[5]:
A B C D
0 a.1 b.1 c.1 b.1
1 a.2 b.2 c.2 b.2
2 a.3 b.3 c.3 b.3
In [6]: df.loc[0,'D'] = 'd.1'
In [7]: df
Out[7]:
A B C D
0 a.1 b.1 c.1 d.1
1 a.2 b.2 c.2 b.2
2 a.3 b.3 c.3 b.3
Following up on these solutions, here is some helpful code illustrating :
#
# Copying columns in pandas without slice warning
#
import numpy as np
df = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC'))
#
# copies column B into new column D
df.loc[:,'D'] = df['B']
print df
#
# creates new column 'E' with values -99
#
# But copy command replaces those where 'B'>0 while others become NaN (not copied)
df['E'] = -99
print df
df['E'] = df[df['B']>0]['B'].copy()
print df
#
# creates new column 'F' with values -99
#
# Copy command only overwrites values which meet criteria 'B'>0
df['F']=-99
df.loc[df['B']>0,'F'] = df[df['B']>0]['B'].copy()
print df
How about:
df['D'] = df['B'].values
I think the correct access method is using the index:
df_2.loc[:,'D'] = df_2['B']
The problem is in the line before the one that throws the warning. When you create df_2 that's where you're creating a copy of a slice of a dataframe. Instead, when you create df_2, use .copy() and you won't get that warning later on.
df_2 = df[df['B'] == 'b.2'].copy()
Here is your dataframe:
import pandas as pd
df = pd.DataFrame({
'A': ['a.1', 'a.2', 'a.3'],
'B': ['b.1', 'b.2', 'b.3'],
'C': ['c.1', 'c.2', 'c.3']})
Your answer is in the paragraph "Setting with enlargement" in the section on "Indexing and selecting data" in the documentation on Pandas.
It says:
A DataFrame can be enlarged on either axis via .loc.
So what you need to do is simply one of these two:
df.loc[:, 'D'] = df.loc[:, 'B']
df.loc[:, 'D'] = df['B']
Source: Stackoverflow.com