Using this as a starting point:
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
I want to use something like an if
statement within pandas.
if df['one'] >= df['two'] and df['one'] <= df['three']:
df['que'] = df['one']
Basically, check each row via the if
statement, create new column.
The docs say to use .all
but there is no example...
This question is related to
python
pandas
if-statement
dataframe
One way is to use a Boolean series to index the column df['one']
. This gives you a new column where the True
entries have the same value as the same row as df['one']
and the False
values are NaN
.
The Boolean series is just given by your if
statement (although it is necessary to use &
instead of and
):
>>> df['que'] = df['one'][(df['one'] >= df['two']) & (df['one'] <= df['three'])]
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 NaN
2 8 5 0 NaN
If you want the NaN
values to be replaced by other values, you can use the fillna
method on the new column que
. I've used 0
instead of the empty string here:
>>> df['que'] = df['que'].fillna(0)
>>> df
one two three que
0 10 1.2 4.2 10
1 15 70 0.03 0
2 8 5 0 0
Use np.select
if you have multiple conditions to be checked from the dataframe and output a specific choice in a different column
conditions=[(condition1),(condition2)]
choices=["choice1","chocie2"]
df["new column"]=np.select=(condtion,choice,default=)
Note: No of conditions and no of choices should match, repeat text in choice if for two different conditions you have same choices
I think the closest to the OP's intuition is an inline if statement:
df['que'] = (df['one'] if ((df['one'] >= df['two']) and (df['one'] <= df['three']))
Use lambda expression:
df[df.apply(lambda x: x['col1'] != x['col2'], axis = 1)]
Wrap each individual condition in parentheses, and then use the &
operator to combine the conditions:
df.loc[(df['one'] >= df['two']) & (df['one'] <= df['three']), 'que'] = df['one']
You can fill the non-matching rows by just using ~
(the "not" operator) to invert the match:
df.loc[~ ((df['one'] >= df['two']) & (df['one'] <= df['three'])), 'que'] = ''
You need to use &
and ~
rather than and
and not
because the &
and ~
operators work element-by-element.
The final result:
df
Out[8]:
one two three que
0 10 1.2 4.2 10
1 15 70 0.03
2 8 5 0
You could use apply() and do something like this
df['que'] = df.apply(lambda x : x['one'] if x['one'] >= x['two'] and x['one'] <= x['three'] else "", axis=1)
or if you prefer not to use a lambda
def que(x):
if x['one'] >= x['two'] and x['one'] <= x['three']:
return x['one']
return ''
df['que'] = df.apply(que, axis=1)
You can use .equals
for columns or entire dataframes.
df['col1'].equals(df['col2'])
If they're equal, that statement will return True
, else False
.
Source: Stackoverflow.com