I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:
if df['c1'] == 'Value':
df['c2'] = 10
else:
df['c2'] = df['c3']
I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).
If I try to run the code above or if I write it as a function and use the apply method, I get the following:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
This question is related to
python
pandas
conditional
try:
df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)
Note the tilda that reverses the selection. It uses pandas methods (i.e. is faster than if
/else
).
df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']
I suggest doing it in two steps:
# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10
# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']
I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:
file['Flag'] = (file['Claim_Amount'] > 0)
This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.
You can use np.where()
to set values based on a specified condition:
#df
c1 c2 c3
0 4 2 1
1 8 7 9
2 1 5 8
3 3 3 5
4 3 6 8
Now change values (or set) in column ['c2']
based on your condition.
df['c2'] = np.where(df.c1 == 8,'X', df.c3)
c1 c3 c4
0 4 1 1
1 8 9 X
2 1 8 8
3 3 5 5
4 3 8 8
You can use pandas.DataFrame.mask
to add virtually as many conditions as you need:
data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}
d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}
d['new'] = np.nan
for value in c.values():
d['new'].mask(d['a'] == value[0], value[1], inplace=True)
d['new'] = d['new'].fillna('Else')
d
Output:
a b new
0 1 6 Else
1 2 8 Value1
2 3 9 Value2
3 4 10 Else
4 5 11 11
Try out df.apply() if you've a small/medium dataframe,
df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)
Else, follow the slicing techniques mentioned in the above comments if you've got a big dataframe.
Source: Stackoverflow.com