[python] pandas: multiple conditions while indexing data frame - unexpected behavior

I am filtering rows in a dataframe by values in two columns.

For some reason the OR operator behaves like I would expect AND operator to behave and vice versa.

My test code:

import pandas as pd

df = pd.DataFrame({'a': range(5), 'b': range(5) })

# let's insert some -1 values
df['a'][1] = -1
df['b'][1] = -1
df['a'][3] = -1
df['b'][4] = -1

df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a != -1) | (df.b != -1)]

print pd.concat([df, df1, df2], axis=1,
                keys = [ 'original df', 'using AND (&)', 'using OR (|)',])

And the result:

      original df      using AND (&)      using OR (|)    
             a  b              a   b             a   b
0            0  0              0   0             0   0
1           -1 -1            NaN NaN           NaN NaN
2            2  2              2   2             2   2
3           -1  3            NaN NaN            -1   3
4            4 -1            NaN NaN             4  -1

[5 rows x 6 columns]

As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them. I would expect exactly the opposite result. Could anyone explain this behavior, please?

I am using pandas 0.13.1.

This question is related to python pandas boolean-logic

The answer is


As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them.

That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1:

df1 = df[(df.a != -1) & (df.b != -1)]

You're saying "keep the rows in which df.a isn't -1 and df.b isn't -1", which is the same as dropping every row in which at least one value is -1.

For df2:

df2 = df[(df.a != -1) | (df.b != -1)]

You're saying "keep the rows in which either df.a or df.b is not -1", which is the same as dropping rows where both values are -1.

PS: chained access like df['a'][1] = -1 can get you into trouble. It's better to get into the habit of using .loc and .iloc.


You can also use query(), i.e.:

df_filtered = df.query('a == 4 & b != 2')

A little mathematical logic theory here:

"NOT a AND NOT b" is the same as "NOT (a OR b)", so:

"a NOT -1 AND b NOT -1" is equivalent of "NOT (a is -1 OR b is -1)", which is opposite (Complement) of "(a is -1 OR b is -1)".

So if you want exact opposite result, df1 and df2 should be as below:

df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a == -1) | (df.b == -1)]

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to boolean-logic

pandas: multiple conditions while indexing data frame - unexpected behavior How can I obtain the element-wise logical NOT of a pandas Series? How to test multiple variables against a value? Any good boolean expression simplifiers out there? if (boolean == false) vs. if (!boolean) How do I test if a variable does not equal either of two values? Differences in boolean operators: & vs && and | vs || Check if at least two out of three booleans are true How to convert "0" and "1" to false and true Does Python support short-circuiting?