what is the most efficient way of counting occurrences in pandas

Question

I have a large  about 12M rows  dataframe df with say   df columns     word   documents   frequency     So the following ran in a timely fashion   word grouping   df   word   frequency    groupby  word   MaxFrequency perWord   word grouping   frequency    max   reset index   MaxFrequency perWord columns     word   MaxFrequency     However  this is taking an unexpected long time to run   Occurrences of Words   word grouping   word    count   reset index     What am I doing wrong here   Is there a better way to count occurences in a large dataframe   df word describe     ran pretty well  so I really did not expect this Occurrences of Words dataframe to take very long to build   ps  If the answer is obvious and you feel the need to penalize me for asking this question  please include the answer as well   thank you

User · Accepted Answer

I think df  word   value counts   should serve  By skipping the groupby machinery  you ll save some time  I m not sure why count should be much slower than max  Both take some time to avoid missing values   Compare with size    In any case  value counts has been specifically optimized to handle object type  like your words  so I doubt you ll do much better than that

User · Answer

Just an addition to the previous answers  Let s not forget that when dealing with real data there might be null values  so it s useful to also include those in the counting by using the option dropna False  default is True   An example    gt  gt  gt  df  Embarked   value counts dropna False  S      644 C      168 Q       77 NaN      2

User · Answer

When you want to count the frequency of categorical data in a column in pandas dataFrame use  df  Column Name   value counts    -Source

[python] what is the most efficient way of counting occurrences in pandas?

Examples related to python

Examples related to pandas