Frequency table for a single variable

Question

One last newbie pandas question for the day   How do I generate a table for a single Series   For example   my series   pandas Series  1 2 2 3 3 3   pandas magical frequency function  my series     gt  gt         1   1       2   2        3   3        Lots of googling has led me to Series describe   and pandas crosstabs  but neither of these does quite what I need  one variable  counts by categories   Oh  and it d be nice if it worked for different data types  strings  ints  etc

User · Answer

The answer provided by  DSM is simple and straightforward  but I thought I d add my own input to this question  If you look at the code for pandas value counts  you ll see that there is a lot going on   If you need to calculate the frequency of many series  this could take a while  A faster implementation would be to use numpy unique with return counts   True  Here is an example   import pandas as pd import numpy as np  my series   pd Series  1 2 2 3 3 3    print my series value counts    3    3 2    2 1    1 dtype  int64   Notice here that the item returned is a pandas Series  In comparison  numpy unique returns a tuple with two items  the unique values and the counts   vals  counts   np unique my series  return counts True  print vals  counts   1 2 3   1 2 3    You can then combine these into a dictionary   results   dict zip vals  counts   print results   1  1  2  2  3  3    And then into a pandas Series  print pd Series results   1    1 2    2 3    3 dtype  int64

User · Answer

Maybe  value counts      gt  gt  gt  import pandas  gt  gt  gt  my series   pandas Series  1 2 2 3 3 3   fred   1 8  1 8    gt  gt  gt  my series 0       1 1       2 2       2 3       3 4       3 5       3 6    fred 7     1 8 8     1 8  gt  gt  gt  counts   my series value counts    gt  gt  gt  counts 3       3 2       2 1 8     2 fred    1 1       1  gt  gt  gt  len counts  5  gt  gt  gt  sum counts  9  gt  gt  gt  counts  fred   1  gt  gt  gt  dict counts   1 8  2  2  2  3  3  1  1   fred   1

User · Answer

You can use list comprehension on a dataframe to count frequencies of the columns as such   my series c  value counts   for c in list my series select dtypes include   O    columns     Breakdown   my series select dtypes include   O               Selects just the categorical data      list my series select dtypes include   O    columns             Turns the columns from above into a list       my series c  value counts   for c in list my series select dtypes include   O    columns              Iterates through the list above and applies value counts   to each of the columns

User · Answer

for frequency distribution of a variable with excessive values you can collapse down the values in classes   Here I excessive values for employrate variable  and there s no meaning of it s frequency distribution with direct values count normalize True                   country  employrate alcconsumption 0           Afghanistan   55 700001             03 1               Albania   11 000000           7 29 2               Algeria   11 000000             69 3               Andorra         nan          10 17 4                Angola   75 699997           5 57                                                    208             Vietnam   71 000000           3 91 209  West Bank and Gaza   32 000000                210         Yemen  Rep    39 000000              2 211              Zambia   61 000000           3 56 212            Zimbabwe   66 800003           4 96   213 rows x 3 columns      frequency distribution with values count normalize True  with no classification length of result here is 139  seems meaningless as a frequency distribution    print gm  employrate   value counts sort False normalize True    50 500000   0 005618 61 500000   0 016854 46 000000   0 011236 64 500000   0 005618 63 500000   0 005618  58 599998   0 005618 63 799999   0 011236 63 200001   0 005618 65 599998   0 005618 68 300003   0 005618 Name  employrate  Length  139  dtype  float64     putting classification we put all values with a certain range ie    0-10 as 1  11-20 as 2   21-30 as 3  and so forth   gm  employrate   gm  employrate   str strip   dropna     gm  employrate   pd to numeric gm  employrate    gm  employrate     np where      gm  employrate    lt  10   amp   gm  employrate    gt  0    1  gm  employrate        gm  employrate     np where      gm  employrate    lt  20   amp   gm  employrate    gt  10    1  gm  employrate        gm  employrate     np where      gm  employrate    lt  30   amp   gm  employrate    gt  20    2  gm  employrate        gm  employrate     np where      gm  employrate    lt  40   amp   gm  employrate    gt  30    3  gm  employrate        gm  employrate     np where      gm  employrate    lt  50   amp   gm  employrate    gt  40    4  gm  employrate        gm  employrate     np where      gm  employrate    lt  60   amp   gm  employrate    gt  50    5  gm  employrate        gm  employrate     np where      gm  employrate    lt  70   amp   gm  employrate    gt  60    6  gm  employrate        gm  employrate     np where      gm  employrate    lt  80   amp   gm  employrate    gt  70    7  gm  employrate        gm  employrate     np where      gm  employrate    lt  90   amp   gm  employrate    gt  80    8  gm  employrate        gm  employrate     np where      gm  employrate    lt  100   amp   gm  employrate    gt  90    9  gm  employrate        print gm  employrate   value counts sort False normalize True       after classification we have a clear frequency distribution  here we can easily see  that 37 64  of countries have employ rate between 51-60  and 11 79  of countries have employ rate between 71-80    5 000000   0 376404 7 000000   0 117978 4 000000   0 179775 6 000000   0 264045 8 000000   0 033708 3 000000   0 028090 Name  employrate  dtype  float64

[python] Frequency table for a single variable

Examples related to python

Examples related to statistics

Examples related to pandas

Examples related to frequency