[python] get dataframe row count based on conditions

I want to get the count of dataframe rows based on conditional selection. I tried the following code.

print df[(df.IP == head.idxmax()) & (df.Method == 'HEAD') & (df.Referrer == '"-"')].count()

output:

IP          57
Time        57
Method      57
Resource    57
Status      57
Bytes       57
Referrer    57
Agent       57
dtype: int64

The output shows the count for each an every column in the dataframe. Instead I need to get a single count where all of the above conditions satisfied? How to do this? If you need more explanation about my dataframe please let me know.

This question is related to python pandas

The answer is


For increased performance you should not evaluate the dataframe using your predicate. You can just use the outcome of your predicate directly as illustrated below:

In [1]: import pandas as pd
        import numpy as np
        df = pd.DataFrame(np.random.randn(20,4),columns=list('ABCD'))


In [2]: df.head()
Out[2]:
          A         B         C         D
0 -2.019868  1.227246 -0.489257  0.149053
1  0.223285 -0.087784 -0.053048 -0.108584
2 -0.140556 -0.299735 -1.765956  0.517803
3 -0.589489  0.400487  0.107856  0.194890
4  1.309088 -0.596996 -0.623519  0.020400

In [3]: %time sum((df['A']>0) & (df['B']>0))
CPU times: user 1.11 ms, sys: 53 µs, total: 1.16 ms
Wall time: 1.12 ms
Out[3]: 4

In [4]: %time len(df[(df['A']>0) & (df['B']>0)])
CPU times: user 1.38 ms, sys: 78 µs, total: 1.46 ms
Wall time: 1.42 ms
Out[4]: 4

Keep in mind that this technique only works for counting the number of rows that comply with your predicate.


In Pandas, I like to use the shape attribute to get number of rows.

df[df.A > 0].shape[0]

gives the number of rows matching the condition A > 0, as desired.