How to count the NaN values in a column in pandas DataFrame

Question

I want to find the number of NaN in each column of my data so that I can drop a column if it has fewer NaN than some threshold  I looked but wasn t able to find any function for this   value counts is too slow for me because most of the values are distinct and I m only interested in the NaN count

User · Answer

import pandas as pd import numpy as np    example DataFrame df   pd DataFrame   a   1 2 np nan    b   np nan 1 np nan       count the NaNs in a column num nan a   df loc   pd isna df  a        a    shape 0  num nan b   df loc   pd isna df  b        b    shape 0     summarize the num nan b print df  print      print f There are  num nan a  NaNs in column a   print f There are  num nan b  NaNs in column b     Gives as output        a    b 0  1 0  NaN 1  2 0  1 0 2  NaN  NaN  There are 1 NaNs in column a There are 2 NaNs in column b

User · Answer

df isnull   sum   will give the column-wise sum of missing values  If you want to know the sum of missing values in a particular column then following code will work  df column isnull   sum

User · Answer

For your task you can use pandas DataFrame dropna  https   pandas pydata org pandas-docs stable reference api pandas DataFrame dropna html    import pandas as pd import numpy as np  df   pd DataFrame   a    1  2  3  4  np nan                       b    1  2  np nan  4  np nan                       c    np nan  2  np nan  4  np nan    df   df dropna axis  columns   thresh 3   print df    Whith thresh parameter you can declare the max count  for NaN values for all columns in DataFrame   Code outputs        a    b 0  1 0  1 0 1  2 0  2 0 2  3 0  NaN 3  4 0  4 0 4  NaN  NaN

User · Answer

df1 isnull   sum     This will do the trick

User · Answer

There is a nice Dzone article from July 2017 which details various ways of summarising NaN values  Check it out here   The article I have cited provides additional value by   1  Showing a way to count and display NaN counts for every column so that one can easily decide whether or not to discard those columns and  2  Demonstrating a way to select those rows in specific which have NaNs so that they may be selectively discarded or imputed   Here s a quick example to demonstrate the utility of the approach - with only a few columns perhaps its usefulness is not obvious but I found it to be of help for larger data-frames   import pandas as pd import numpy as np    example DataFrame df   pd DataFrame   a   1 2 np nan    b   np nan 1 np nan       Check whether there are null values in columns null columns   df columns df isnull   any    print df null columns  isnull   sum       One can follow along further per the cited article

User · Answer

Used the solution proposed by  sushmit in my code    A possible variation of  the same can also be  colNullCnt      for z in range len df1 cols        colNullCnt append  df1 cols z   sum pd isnull trainPd df1 cols z         Advantage of this is that it returns the result for each of the columns in the df henceforth

User · Answer

Based on the most voted answer we can easily define a function that gives us a dataframe to preview the missing values and the   of missing values in each column  def missing values table df       mis val   df isnull   sum       mis val percent   100   df isnull   sum     len df      mis val table   pd concat  mis val  mis val percent   axis 1      mis val table ren columns   mis val table rename      columns    0    Missing Values   1      of Total Values        mis val table ren columns   mis val table ren columns          mis val table ren columns iloc   1     0  sort values         of Total Values   ascending False  round 1      print   quot Your selected dataframe has  quot    str df shape 1      quot  columns  n quot                 quot There are  quot    str mis val table ren columns shape 0                  quot  columns that have missing values  quot       return mis val table ren columns

User · Answer

based to the answer that was given and some improvements this is my approach   def PercentageMissin Dataset          this function will return the percentage of missing values in a dataset         if isinstance Dataset pd DataFrame           adict     a dictionary conatin keys columns names and values percentage of missin value in the columns         for col in Dataset columns              adict col   np count nonzero Dataset col  isnull    100  len Dataset col           return pd DataFrame adict index     of missing   columns adict keys        else          raise TypeError  can only be used with panda dataframe

User · Answer

import numpy as np import pandas as pd  raw data     first name     Jason   np nan   Tina    Jake    Amy              last name     Miller   np nan  np nan   Milner    Cooze              age    22  np nan  23  24  25             sex     m   np nan   f    m    f              Test1 Score    4  np nan  0  0  0            Test2 Score    25  np nan  np nan  0  0   results   pd DataFrame raw data  columns     first name    last name    age    sex    Test1 Score    Test2 Score        results        first name last name   age  sex  Test1 Score  Test2 Score 0      Jason    Miller  22 0    m          4 0         25 0 1        NaN       NaN   NaN  NaN          NaN          NaN 2       Tina       NaN  23 0    f          0 0          NaN 3       Jake    Milner  24 0    m          0 0          0 0 4        Amy     Cooze  25 0    f          0 0          0 0       You can use following function  which will give you output in Dataframe   Zero Values    Missing Values        of Total Values Total Zero Missing Values   Total Zero Missing Values Data Type     Just copy and paste following function and call it by passing your pandas Dataframe  def missing zero values table df           zero val    df    0 00  astype int  sum axis 0          mis val   df isnull   sum           mis val percent   100   df isnull   sum     len df          mz table   pd concat  zero val  mis val  mis val percent   axis 1          mz table   mz table rename          columns    0    Zero Values   1    Missing Values   2      of Total Values            mz table  Total Zero Missing Values     mz table  Zero Values     mz table  Missing Values           mz table    Total Zero Missing Values     100   mz table  Total Zero Missing Values     len df          mz table  Data Type     df dtypes         mz table   mz table              mz table iloc   1     0  sort values             of Total Values   ascending False  round 1          print   Your selected dataframe has     str df shape 1       columns and     str df shape 0       Rows  n                     There are     str mz table shape 0                     columns that have missing values              mz table to excel  D  sampledata missing and zero values xlsx   freeze panes  1 0   index   False          return mz table  missing zero values table results    Output  Your selected dataframe has 6 columns and 5 Rows  There are 6 columns that have missing values                Zero Values  Missing Values    of Total Values  Total Zero Missing Values    Total Zero Missing Values Data Type last name              0               2               40 0                          2                         40 0    object Test2 Score            2               2               40 0                          4                         80 0   float64 first name             0               1               20 0                          1                         20 0    object age                    0               1               20 0                          1                         20 0   float64 sex                    0               1               20 0                          1                         20 0    object Test1 Score            3               1               20 0                          4                         80 0   float64   If you want to keep it simple then you can use following function to get missing values in    def missing dff       print  round  dff isnull   sum     100  len dff   2  sort values ascending False     missing results      Test2 Score    40 0 last name      40 0 Test1 Score    20 0 sex            20 0 age            20 0 first name     20 0 dtype  float64

User · Answer

https   pandas pydata org pandas-docs stable generated pandas Series count html pandas Series count  pandas Series count Series count level None  source    Return number of non-NA null observations in the Series

User · Answer

You could subtract the total length from the count of non-nan values   count nan   len df  - df count     You should time it on your data  For small Series got a 3x speed up in comparison with the isnull solution

User · Answer

Since pandas 0 14 1 my suggestion here to have a keyword argument in the value counts method has been implemented   import pandas as pd df   pd DataFrame   a   1 2 np nan    b   np nan 1 np nan    for col in df      print df col  value counts dropna False   2     1  1     1 NaN    1 dtype  int64 NaN    2  1     1 dtype  int64

User · Answer

Lets assume df is a pandas DataFrame   Then    df isnull   sum axis   0    This will give number of NaN values in every column    If you need  NaN values in every row    df isnull   sum axis   1

User · Answer

One solution is finding out null value rows and converting them into dataframe and then checking the length of the new dataframe - nan rows   df df  column name   isnull    print len nan rows

User · Answer

Suppose you want to get the number of missing values NaN  in a column series  known as price in a dataframe called reviews   import the dataframe import pandas as pd  reviews   pd read csv     input wine-reviews winemag-data-130k-v2 csv   index col 0    To get the missing values  with n missing prices as the variable  simple do  n missing prices   sum reviews price isnull    print n missing prices    sum is the key method here  was trying to use count before i realized sum is the right method to use in this context

User · Answer

You can use value counts method and print values of np nan  s value counts dropna   False  np nan

User · Answer

One other simple option not suggested yet  to just count NaNs  would be adding in the shape to return the number of rows with NaN   df df  col name   isnull     col name   shape

User · Answer

For the 1st part count NaN we have multiple way  Method 1 count   due to the count will ignore the NaN which is different from size print len df  - df count     Method 2 isnull   isna chain with sum print df isnull   sum     print df isna   sum     Method 3 describe   info   notice this will output the  notnull  value count print df describe     print df info     Method from numpy print np count nonzero np isnan df values  axis 0     For the 2nd part of the question  If we would like drop the column by the thresh we can try with dropna  thresh  optional Require that many non-NA values   Thresh   n   no null value require  you can also get the by int x    len df   df   df dropna thresh   Thresh  axis   1

User · Answer

Here is the code for counting Null values column wise    df isna   sum

User · Answer

You can use the isna   method  or it s alias isnull   which is also compatible with older pandas versions  lt  0 21 0  and then sum to count the NaN values  For one column   In  1   s   pd Series  1 2 3  np nan  np nan    In  4   s isna   sum       or s isnull   sum   for older pandas versions Out 4   2   For several columns  it also works   In  5   df   pd DataFrame   a   1 2 np nan    b   np nan 1 np nan     In  6   df isna   sum   Out 6   a    1 b    2 dtype  int64

User · Answer

Please use below for particular column count  dataframe columnName isnull   sum

User · Answer

Hope this helps  import pandas as pd import numpy as np df   pd DataFrame   a   1 2 np nan    b   np nan 1 np nan   c   np nan 2 np nan    d   np nan np nan np nan      df isnull   sum   len df    100   Thres   40  df isnull   sum   len df    100    lt  Thres

User · Answer

if you are using Jupyter Notebook  How about         timeit  df isnull   any   any     or    timeit   df isnull   values sum     or  are there anywhere NaNs in the data  if yes  where    df isnull   any

User · Answer

To count zeroes   df df    0  count axis 0    To count NaN   df isnull   sum     or  df isna   sum

User · Answer

if its just counting nan values in a pandas column here is a quick way  import pandas as pd    df1 as an example data frame     col1 name of column for which you want to calculate the nan values sum pd isnull df1  col1

User · Answer

In case you need to get the non-NA  non-None  and NA  None  counts across different groups pulled out by groupby   gdf   df groupby   ColumnToGroupBy     def countna x       return  x isna    sum    gdf agg   count   countna   size      This returns the counts of non-NA  NA and total number of entries per group

User · Answer

The below will print all the Nan columns in descending order   df isnull   sum   sort values ascending   False    or  The below will print first 15 Nan columns in descending order   df isnull   sum   sort values ascending   False  head 15

[python] How to count the NaN values in a column in pandas DataFrame

Examples related to python

Examples related to pandas

Examples related to dataframe