I'm trying to set the entire column of a dataframe to a specific value.
In [1]: df
Out [1]:
issueid industry
0 001 xxx
1 002 xxx
2 003 xxx
3 004 xxx
4 005 xxx
From what I've seen, loc
is the best practice when replacing values in a dataframe (or isn't it?):
In [2]: df.loc[:,'industry'] = 'yyy'
However, I still received this much talked-about warning message:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
If I do
In [3]: df['industry'] = 'yyy'
I got the same warning message.
Any ideas? Working with Python 3.5.2 and pandas 0.18.1.
Seems to me that:
df1 = df[df['col1']==some_value] WILL NOT create a new DataFrame, basically, changes in df1 will be reflected in the parent df. This leads to the warning. Whereas, df1 = df[df['col1]]==some_value].copy() WILL create a new DataFrame, and changes in df1 will not be reflected in df. the copy() method is recommended if you don't want to make changes to your original df.
I had a similar issue before even with this approach df.loc[:,'industry'] = 'yyy'
, but once I refreshed the notebook, it ran well.
You may want to try refreshing the cells after you have df.loc[:,'industry'] = 'yyy'
.
You can do :
df['industry'] = 'yyy'
df.loc[:,'industry'] = 'yyy'
This does the magic. You are to add '.loc' with ':' for all rows. Hope it helps
This provides you with the possibility of adding conditions on the rows and then change all the cells of a specific column corresponding to those rows:
df.loc[(df['issueid'] == '001'), 'industry'] = str('yyy')
Assuming your Data frame is like 'Data' you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.
import pandas as pd
data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')]
df = pd.DataFrame(data,columns=['issueid', 'industry'])
print("Old DataFrame")
print(df)
df.loc[:,'industry'] = str('yyy')
print("New DataFrame")
print(df)
Now if want to put numbers instead of letters you must create and array
list_of_ones = [1,1,1,1,1]
df.loc[:,'industry'] = list_of_ones
print(df)
Or if you are using Numpy
import numpy as np
n = len(df)
df.loc[:,'industry'] = np.ones(n)
print(df)
You can use the assign
function:
df = df.assign(industry='yyy')
Source: Stackoverflow.com