I want to get the count of dataframe rows based on conditional selection. I tried the following code.
print df[(df.IP == head.idxmax()) & (df.Method == 'HEAD') & (df.Referrer == '"-"')].count()
output:
IP 57
Time 57
Method 57
Resource 57
Status 57
Bytes 57
Referrer 57
Agent 57
dtype: int64
The output shows the count for each an every column in the dataframe. Instead I need to get a single count where all of the above conditions satisfied? How to do this? If you need more explanation about my dataframe please let me know.
For increased performance you should not evaluate the dataframe using your predicate. You can just use the outcome of your predicate directly as illustrated below:
In [1]: import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(20,4),columns=list('ABCD'))
In [2]: df.head()
Out[2]:
A B C D
0 -2.019868 1.227246 -0.489257 0.149053
1 0.223285 -0.087784 -0.053048 -0.108584
2 -0.140556 -0.299735 -1.765956 0.517803
3 -0.589489 0.400487 0.107856 0.194890
4 1.309088 -0.596996 -0.623519 0.020400
In [3]: %time sum((df['A']>0) & (df['B']>0))
CPU times: user 1.11 ms, sys: 53 µs, total: 1.16 ms
Wall time: 1.12 ms
Out[3]: 4
In [4]: %time len(df[(df['A']>0) & (df['B']>0)])
CPU times: user 1.38 ms, sys: 78 µs, total: 1.46 ms
Wall time: 1.42 ms
Out[4]: 4
Keep in mind that this technique only works for counting the number of rows that comply with your predicate.
In Pandas, I like to use the shape
attribute to get number of rows.
df[df.A > 0].shape[0]
gives the number of rows matching the condition A > 0
, as desired.
Source: Stackoverflow.com