Count the frequency that a value occurs in a dataframe column

Question

I have a dataset category cat a cat b cat a  I d like to be able to return something like  showing unique values and frequency  category   freq  cat a       2 cat b       1

User · Accepted Answer

Use groupby and count  In  37   df   pd DataFrame   a  list  abssbab     df groupby  a   count    Out 37       a a    a  2 b  3 s  2   3 rows x 1 columns   See the online docs  https   pandas pydata org pandas-docs stable user guide groupby html Also value counts   as  DSM has commented  many ways to skin a cat here In  38   df  a   value counts    Out 38    b    3 a    2 s    2 dtype  int64  If you wanted to add frequency back to the original dataframe use transform to return an aligned index  In  41   df  freq     df groupby  a    a   transform  count   df  Out 41       a freq 0  a    2 1  b    3 2  s    2 3  s    2 4  b    3 5  a    2 6  b    3   7 rows x 2 columns

User · Answer

your data    category  cat a cat b cat a   solution    df  freq     df groupby  category    category   transform  count    df    df drop duplicates

User · Answer

I believe this should work fine for any DataFrame columns list   def column list x       column list df          for col name in x columns          y   col name  len x col name  unique            column list df append y  return pd DataFrame column list df   column list df rename columns  0   Feature   1   Value count      The function  column list  checks the columns names and then checks the uniqueness of each column values

User · Answer

df apply pd value counts  fillna 0    value counts - Returns object containing counts of unique values  apply - count frequency in every column  If you set axis 1  you get frequency in every row  fillna 0  - make output more fancy  Changed NaN to 0

User · Answer

Without any libraries  you could do this instead   def to frequency table data       frequencytable          for key in data          if key in frequencytable              frequencytable key     1         else              frequencytable key    1     return frequencytable   Example   to frequency table  1 1 1 1 2 3 4 4    gt  gt  gt   1  4  2  1  3  1  4  2

User · Answer

If you want to apply to all columns you can use   df apply pd value counts    This will apply a column based aggregation function  in this case value counts  to each of the columns

User · Answer

df category value counts     This short little line of code will give you the output you want   If your column name has spaces you can use  df  category   value counts

User · Answer

n values   data income value counts     First unique value  count  n at most 50k   n values 0    Second unique value  count  n greater 50k   n values 1   n values   Output    lt  50K    34014  gt 50K     11208  Name  income  dtype  int64   Output   n greater 50k n at most 50k -  11208  34014

User · Answer

Using list comprehension and value counts for multiple columns in a df   my series c  value counts   for c in list my series select dtypes include   O    columns     https   stackoverflow com a 28192263 786326

User · Answer

If your DataFrame has values with the same type  you can also set return counts True in numpy unique     index  counts   np unique df values return counts True    np bincount   could be faster if your values are integers

User · Answer

You can also do this with pandas by broadcasting your columns as categories first  e g  dtype  category  e g   cats     client    hotel    currency    ota    user country    df cats    df cats  astype  category     and then calling describe   df cats  describe     This will give you a nice table of value counts and a bit more          client  hotel   currency    ota user country count   852845  852845  852845  852845  852845 unique  2554    17477   132 14  219 top 2198    13202   USD Hades   US freq    102562  8847    516500  242734  340992

User · Answer

metatoaster has already pointed this out   Go for Counter  It s blazing fast     import pandas as pd from collections import Counter import timeit import numpy as np  df   pd DataFrame np random randint 1  10000   100  2    columns   NumA    NumB      Timers   timeit -n 10000 df  NumA   value counts     10000 loops  best of 3  715   s per loop   timeit -n 10000 df  NumA   value counts   to dict     10000 loops  best of 3  796   s per loop   timeit -n 10000 Counter df  NumA      10000 loops  best of 3  74   s per loop   timeit -n 10000 df groupby   NumA    count     10000 loops  best of 3  1 29 ms per loop   Cheers

User · Answer

In 0 18 1 groupby together with count does not give the frequency of unique values    gt  gt  gt  df    a 0  a 1  b 2  s 3  s 4  b 5  a 6  b   gt  gt  gt  df groupby  a   count   Empty DataFrame Columns     Index   a  b  s    However  the unique values and their frequencies are easily determined using size    gt  gt  gt  df groupby  a   size   a a    2 b    3 s    2   With df a value counts   sorted values  in descending order  i e  largest value first  are returned by default

[python] Count the frequency that a value occurs in a dataframe column

Examples related to python

Examples related to pandas

Examples related to frequency