[python] Remove rows not .isin('X')

Sorry just getting into Pandas, this seems like it should be a very straight forward question. How can I use the isin('X') to remove rows that are in the list X? In R I would write !which(a %in% b).

This question is related to python filtering pandas

The answer is


You can use numpy.logical_not to invert the boolean array returned by isin:

In [63]: s = pd.Series(np.arange(10.0))

In [64]: x = range(4, 8)

In [65]: mask = np.logical_not(s.isin(x))

In [66]: s[mask]
Out[66]: 
0    0
1    1
2    2
3    3
8    8
9    9

As given in the comment by Wes McKinney you can also use

s[~s.isin(x)]

You can use the DataFrame.select method:

In [1]: df = pd.DataFrame([[1,2],[3,4]], index=['A','B'])

In [2]: df
Out[2]: 
   0  1
A  1  2
B  3  4

In [3]: L = ['A']

In [4]: df.select(lambda x: x in L)
Out[4]: 
   0  1
A  1  2

You have many options. Collating some of the answers above and the accepted answer from this post you can do:
1. df[-df["column"].isin(["value"])]
2. df[~df["column"].isin(["value"])]
3. df[df["column"].isin(["value"]) == False]
4. df[np.logical_not(df["column"].isin(["value"]))]

Note: for option 4 for you'll need to import numpy as np

Update: You can also use the .query method for this too. This allows for method chaining:
5. df.query("column not in @values").
where values is a list of the values that you don't want to include.


All you have to do is create a subset of your dataframe where the isin method evaluates to False:

df = df[df['Column Name'].isin(['Value']) == False]

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to filtering

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() Filtering array of objects with lodash based on property value How can I return the difference between two lists? I have filtered my Excel data and now I want to number the rows. How do I do that? Creating lowpass filter in SciPy - understanding methods and units filter items in a python dictionary where keys contain a specific string Detect and exclude outliers in Pandas data frame Filtering Pandas DataFrames on dates Logical operators for boolean indexing in Pandas How to run a SQL query on an Excel table?

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float