I am trying to find the number of times a certain value appears in one column.
I have made the dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')
and now I want to find the number of times something appears in a column. How is this done?
I thought it was the below, where I am looking in the education column and counting the number of time ?
occurs.
The code below shows that I am trying to find the number of times 9th
appears and the error is what I am getting when I run the code
Code
missing2 = df.education.value_counts()['9th']
print(missing2)
Error
KeyError: '9th'
Try this:
(df[education]=='9th').sum()
for finding a specific value of a column you can use the code below
irrespective of the preference you can use the any of the method you like
df.col_name.value_counts().Value_you_are_looking_for
take example of the titanic dataset
df.Sex.value_counts().male
this gives a count of all male on the ship Although if you want to count a numerical data then you cannot use the above method because value_counts() is used only with series type of data hence fails So for that you can use the second method example
the second method is
#this is an example method of counting on a data frame
df[(df['Survived']==1)&(df['Sex']=='male')].counts()
this is not that efficient as value_counts() but surely will help if you want to count values of a data frame hope this helps
Couple of ways using count
or sum
In [338]: df
Out[338]:
col1 education
0 a 9th
1 b 9th
2 c 8th
In [335]: df.loc[df.education == '9th', 'education'].count()
Out[335]: 2
In [336]: (df.education == '9th').sum()
Out[336]: 2
In [337]: df.query('education == "9th"').education.count()
Out[337]: 2
An elegant way to count the occurrence of '?'
or any symbol in any column, is to use built-in function isin
of a dataframe object.
Suppose that we have loaded the 'Automobile' dataset into df
object.
We do not know which columns contain missing value ('?'
symbol), so let do:
df.isin(['?']).sum(axis=0)
DataFrame.isin(values)
official document says:
it returns boolean DataFrame showing whether each element in the DataFrame is contained in values
Note that isin
accepts an iterable as input, thus we need to pass a list containing the target symbol to this function. df.isin(['?'])
will return a boolean dataframe as follows.
symboling normalized-losses make fuel-type aspiration-ratio ...
0 False True False False False
1 False True False False False
2 False True False False False
3 False False False False False
4 False False False False False
5 False True False False False
...
To count the number of occurrence of the target symbol in each column, let's take sum
over all the rows of the above dataframe by indicating axis=0
.
The final (truncated) result shows what we expect:
symboling 0
normalized-losses 41
...
bore 4
stroke 4
compression-ratio 0
horsepower 2
peak-rpm 2
city-mpg 0
highway-mpg 0
price 4
easy but not efficient:
list(df.education).count('9th')
Source: Stackoverflow.com