I am trying to check if a certain value is contained in a python column. I'm using df.date.isin(['07311954'])
, which I do not doubt to be a good tool. The problem is that I have over 350K rows and the output won't show
all of them so that I can see if the value is actually contained. Put simply, I just want to know (Y/N) whether or not a specific value is contained in a column. My code follows:
import numpy as np
import pandas as pd
import glob
df = (pd.read_csv('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas214.txt',\
sep='|', header=None, low_memory=False, names=['1', '2', '3', '4', '5', '6', '7', \
'8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', \
'21', '22']))
df.date.isin(['07311954'])
I think you need str.contains
, if you need rows where values of column date
contains string 07311954
:
print df[df['date'].astype(str).str.contains('07311954')]
Or if type
of date
column is string
:
print df[df['date'].str.contains('07311954')]
If you want check last 4 digits for string
1954
in column date
:
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
Sample:
print df['date']
0 8152007
1 9262007
2 7311954
3 2252011
4 2012011
5 2012011
6 2222011
7 2282011
Name: date, dtype: int64
print df['date'].astype(str).str[-4:].str.contains('1954')
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
Name: date, dtype: bool
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
cmte_id trans_typ entity_typ state employer occupation date \
2 C00119040 24K CCM MD NaN NaN 7311954
amount fec_id cand_id
2 1000 C00140715 H2MD05155
You can use any
:
print any(df.column == 07311954)
True #true if it contains the number, false otherwise
If you rather want to see how many times '07311954' occurs in a column you can use:
df.column[df.column == 07311954].count()
You can simply use this:
'07311954' in df.date.values
which returns True
or False
Here is the further explanation:
In pandas, using in
check directly with DataFrame and Series (e.g. val in df
or val in series
) will check whether the val
is contained in the Index.
BUT you can still use in
check for their values too (instead of Index)! Just using val in df.col_name.values
or val in series.values
. In this way, you are actually checking the val
with a Numpy array.
And .isin(vals)
is the other way around, it checks whether the DataFrame/Series values are in the vals
. Here vals
must be set or list-like. So this is not the natural way to go for the question.
Source: Stackoverflow.com