How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Question

I have this DataFrame and want only the records whose EPS column is not NaN    gt  gt  gt  df                  STK ID  EPS  cash STK ID RPT Date                    601166 20111231  601166  NaN   NaN 600036 20111231  600036  NaN    12 600016 20111231  600016  4 3   NaN 601009 20111231  601009  NaN   NaN 601939 20111231  601939  2 5   NaN 000001 20111231  000001  NaN   NaN      i e  something like df drop       to get this resulting dataframe                     STK ID  EPS  cash STK ID RPT Date                    600016 20111231  600016  4 3   NaN 601939 20111231  601939  2 5   NaN   How do I do that

User · Answer

This question is already resolved  but         also consider the solution suggested by Wouter in his original comment  The ability to handle missing data  including dropna    is built into pandas explicitly  Aside from potentially improved performance over doing it manually  these functions also come with a variety of options which may be useful    In  24   df   pd DataFrame np random randn 10 3    In  25   df iloc   2 0    np nan  df iloc   4 1    np nan  df iloc   3 2    np nan   In  26   df Out 26             0         1         2 0       NaN       NaN       NaN 1  2 677677 -1 466923 -0 750366 2       NaN  0 798002 -0 906038 3  0 672201  0 964789       NaN 4       NaN       NaN  0 050742 5 -1 250970  0 030561 -2 678622 6       NaN  1 036043       NaN 7  0 049896 -0 308003  0 823295 8       NaN       NaN  0 637482 9 -0 310130  0 078891       NaN     In  27   df dropna        drop all rows that have any NaN values Out 27             0         1         2 1  2 677677 -1 466923 -0 750366 5 -1 250970  0 030561 -2 678622 7  0 049896 -0 308003  0 823295     In  28   df dropna how  all        drop only if ALL columns are NaN Out 28             0         1         2 1  2 677677 -1 466923 -0 750366 2       NaN  0 798002 -0 906038 3  0 672201  0 964789       NaN 4       NaN       NaN  0 050742 5 -1 250970  0 030561 -2 678622 6       NaN  1 036043       NaN 7  0 049896 -0 308003  0 823295 8       NaN       NaN  0 637482 9 -0 310130  0 078891       NaN     In  29   df dropna thresh 2     Drop row if it does not have at least two values that are   not   NaN Out 29             0         1         2 1  2 677677 -1 466923 -0 750366 2       NaN  0 798002 -0 906038 3  0 672201  0 964789       NaN 5 -1 250970  0 030561 -2 678622 7  0 049896 -0 308003  0 823295 9 -0 310130  0 078891       NaN     In  30   df dropna subset  1      Drop only if NaN in specific column  as asked in the question  Out 30             0         1         2 1  2 677677 -1 466923 -0 750366 2       NaN  0 798002 -0 906038 3  0 672201  0 964789       NaN 5 -1 250970  0 030561 -2 678622 6       NaN  1 036043       NaN 7  0 049896 -0 308003  0 823295 9 -0 310130  0 078891       NaN   There are also other options  See docs at http   pandas pydata org pandas-docs stable generated pandas DataFrame dropna html   including dropping columns instead of rows    Pretty handy

User · Answer

Simplest of all solutions   filtered df   df df  EPS   notnull         The above solution is way better than using np isfinite

User · Answer

In datasets having large number of columns its even better to see how many columns contain null values and how many don t     print  No  of columns containing null values   print len df columns df isna   any       print  No  of columns not containing null values   print len df columns df notna   all       print  Total no  of columns in the dataframe   print len df columns     For example in my dataframe it contained 82 columns  of which 19 contained at least one null value     Further you can also automatically remove cols and rows depending on which has more null values Here is the code which does this intelligently     df   df drop df columns df isna   sum   gt len df columns   axis   1  df   df dropna axis   0  reset index drop True    Note  Above code removes all of your null values  If you want null values  process them before

User · Answer

Another version   df  df  EPS   isna

User · Answer

It  may be added at that   amp   can be used to add additional conditions e g   df   df  df EPS  gt  2 0   amp   df EPS  lt 4 0     Notice that when evaluating the statements  pandas needs parenthesis

User · Answer

You could use dataframe method notnull or inverse of isnull  or numpy isnan   In  332   df df EPS notnull    Out 332      STK ID  RPT Date  STK ID 1  EPS  cash 2  600016  20111231    600016  4 3   NaN 4  601939  20111231    601939  2 5   NaN   In  334   df  df EPS isnull    Out 334      STK ID  RPT Date  STK ID 1  EPS  cash 2  600016  20111231    600016  4 3   NaN 4  601939  20111231    601939  2 5   NaN   In  347   df  np isnan df EPS   Out 347      STK ID  RPT Date  STK ID 1  EPS  cash 2  600016  20111231    600016  4 3   NaN 4  601939  20111231    601939  2 5   NaN

User · Answer

I know this has already been answered  but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman  which was wonderful  and in case anyone else happens upon this   import pandas as pd df   df pd notnull df  EPS

User · Answer

How to drop rows of Pandas DataFrame whose value in a certain column is NaN  This is an old question which has been beaten to death but I do believe there is some more useful information to be surfaced on this thread  Read on if you re looking for the answer to any of the following questions   Can I drop rows if any of its values have NaNs  What about if all of them are NaN  Can I only look at NaNs in specific columns when dropping rows  Can I drop rows with a specific count of NaN values  How do I drop columns instead of rows  I tried all of the options above but my DataFrame just won t update    DataFrame dropna  Usage  and Examples It s already been said that df dropna is the canonical method to drop NaNs from DataFrames  but there s nothing like a few visual cues to help along the way    Setup df   pd DataFrame        A    np nan  2  3  4          B    np nan  np nan  2  3         C    np nan  3    3      df                            A    B    C 0  NaN  NaN  NaN 1  2 0  NaN  NaN 2  3 0  2 0  NaN 3  4 0  3 0  3 0  Below is a detail of the most important arguments and how they work  arranged in an FAQ format   Can I drop rows if any of its values have NaNs  What about if all of them are NaN  This is where the how     argument comes in handy  It can be one of   any   default  - drops rows if at least one column has NaN  all  - drops rows only if all of its columns have NaNs   lt    - gt    Removes all but the last row since there are no NaNs  df dropna         A    B    C 3  4 0  3 0  3 0    Removes the first row only df dropna how  all         A    B    C 1  2 0  NaN  NaN 2  3 0  2 0  NaN 3  4 0  3 0  3 0   Note If you just want to see which rows are null  IOW  if you want a boolean mask of rows   use isna  df isna           A      B      C 0   True   True   True 1  False   True   True 2  False  False   True 3  False  False  False  df isna   any axis 1   0     True 1     True 2     True 3    False dtype  bool  To get the inversion of this result  use notna instead    Can I only look at NaNs in specific columns when dropping rows  This is a use case for the subset       argument  Specify a list of columns  or indexes with axis 1  to tells pandas you only want to look at these columns  or rows with axis 1  when dropping rows  or columns with axis 1    Drop all rows with NaNs in A df dropna subset   A          A    B    C 1  2 0  NaN  NaN 2  3 0  2 0  NaN 3  4 0  3 0  3 0    Drop all rows with NaNs in A OR B df dropna subset   A    B          A    B    C 2  3 0  2 0  NaN 3  4 0  3 0  3 0   Can I drop rows with a specific count of NaN values  This is a use case for the thresh     argument  Specify the minimum number of NON-NULL values as an integer  df dropna thresh 1          A    B    C 1  2 0  NaN  NaN 2  3 0  2 0  NaN 3  4 0  3 0  3 0  df dropna thresh 2        A    B    C 2  3 0  2 0  NaN 3  4 0  3 0  3 0  df dropna thresh 3        A    B    C 3  4 0  3 0  3 0  The thing to note here is you need to specify how many NON-NULL values you want to keep  rather than how many NULL values you want to drop  This is a pain point for new users  Luckily the fix is easy  if you have a count of NULL values  simply subtract it from the column size to get the correct thresh argument for the function  required min null values to drop   2   drop rows with at least 2 NaN df dropna thresh df shape 1  - required min null values to drop   1        A    B    C 2  3 0  2 0  NaN 3  4 0  3 0  3 0   How do I drop columns instead of rows  Use the axis     argument  it can be axis 0 or axis 1  Tells the function whether you want to drop rows  axis 0  or drop columns  axis 1   df dropna         A    B    C 3  4 0  3 0  3 0    All columns have rows  so the result is empty  df dropna axis 1   Empty DataFrame Columns     Index   0  1  2  3     Here s a different example requiring the column to have all NaN rows   to be dropped  In this case no columns satisfy the condition  df dropna axis 1  how  all         A    B    C 0  NaN  NaN  NaN 1  2 0  NaN  NaN 2  3 0  2 0  NaN 3  4 0  3 0  3 0    Here s a different example requiring a column to have at least 2 NON-NULL   values  Column C has less than 2 NON-NULL values  so it should be dropped  df dropna axis 1  thresh 2        A    B 0  NaN  NaN 1  2 0  NaN 2  3 0  2 0 3  4 0  3 0   I tried all of the options above but my DataFrame just won t update  dropna  like most other functions in the pandas API returns a new DataFrame  a copy of the original with changes  as the result  so you should assign it back if you want to see changes  df dropna        wrong df dropna      inplace True    right  but not recommended df   df dropna        right   Reference https   pandas pydata org pandas-docs stable reference api pandas DataFrame dropna html  DataFrame dropna      self  axis 0  how  any   thresh None  subset None  inplace False

User · Answer

yet another solution which uses the fact that np nan    np nan   In  149   df query  EPS    EPS   Out 149                    STK ID  EPS  cash STK ID RPT Date 600016 20111231  600016  4 3   NaN 601939 20111231  601939  2 5   NaN

User · Answer

Simple and easy way  df dropna subset   EPS   inplace True   source  https   pandas pydata org pandas-docs stable generated pandas DataFrame dropna html

User · Answer

You can use this   df dropna subset   EPS    how  all   inplace True

User · Answer

Don t drop  just take the rows where EPS is not NA   df   df df  EPS   notna

[python] How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to nan