Calculate summary statistics of columns in dataframe

Question

I have a dataframe of the following form  for example   shopper num is martian number of items count pineapples birth country tranpsortation method 1 FALSE 0 0 MX  2 FALSE 1 0 MX  3 FALSE 0 0 MX  4 FALSE 22 0 MX  5 FALSE 0 0 MX  6 FALSE 0 0 MX  7 FALSE 5 0 MX  8 FALSE 0 0 MX  9 FALSE 4 0 MX  10 FALSE 2 0 MX  11 FALSE 0 0 MX  12 FALSE 13 0 MX  13 FALSE 0 0 CA  14 FALSE 0 0 US    How can I use Pandas to calculate summary statistics of each column  column data types are variable  some columns have no information   And then return the a dataframe of the form   columnname  max  min  median   is martian  NA  NA  FALSE   So on and so on

User · Answer

To clarify one point in  EdChum s answer  per the documentation  you can include the object columns by using df describe include  all    It won t provide many statistics  but will provide a few pieces of info  including count  number of unique values  top value  This may be a new feature  I don t know as I am a relatively new user

User · Answer

Now there is the pandas profiling package  which is a more complete alternative to df describe     If your pandas dataframe is df  the below will return a complete analysis including some warnings about missing values  skewness  etc  It presents histograms and correlation plots as well   import pandas profiling pandas profiling ProfileReport df    See the example notebook detailing the usage

User · Answer

describe may give you everything you want otherwise you can perform aggregations using groupby and pass a list of agg functions  http   pandas pydata org pandas-docs stable groupby html applying-multiple-functions-at-once  In  43    df describe    Out 43           shopper num is martian  number of items  count pineapples count      14 0000         14        14 000000                14 mean        7 5000          0         3 357143                 0 std         4 1833          0         6 452276                 0 min         1 0000      False         0 000000                 0 25          4 2500          0         0 000000                 0 50          7 5000          0         0 000000                 0 75         10 7500          0         3 500000                 0 max        14 0000      False        22 000000                 0   8 rows x 4 columns    Note that some columns cannot be summarised as there is no logical way to summarise them  for instance columns containing string data  As you prefer you can transpose the result if you prefer   In  47    df describe   transpose    Out 47                     count      mean       std    min   25   50     75     max shopper num         14       7 5    4 1833      1  4 25  7 5  10 75     14 is martian          14         0         0  False     0    0      0  False number of items     14  3 357143  6 452276      0     0    0    3 5     22 count pineapples    14         0         0      0     0    0      0      0   4 rows x 8 columns

[python] Calculate summary statistics of columns in dataframe

Examples related to python

Examples related to pandas

Examples related to csv

Examples related to dataframe

Examples related to profiling