I have a DataFrame, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:
df[df.my_channel > 20000].my_channel = 0
If I copy the channel into a new data frame it's simple:
df2 = df.my_channel
df2[df2 > 20000] = 0
This does exactly what I want, but seems not to work with the channel as part of the original DataFrame.
This question is related to
python
pandas
replace
conditional-statements
series
np.where
function works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid what is called chained indexing.
You have a few alternatives:-
loc
+ Boolean indexingloc
may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask
+ Boolean indexingYou can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where
+ Boolean indexingYou can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
Try this:
df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)
or
df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix
has been deprecated in favour of loc
/ iloc
.
I would use lambda
function on a Series
of a DataFrame
like this:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
Source: Stackoverflow.com