Finding non-numeric rows in dataframe in pandas

Question

I have a large dataframe in pandas that apart from the column used as index is supposed to have only numeric values   df   pd DataFrame   a    1  2  3   bad   5                       b    0 1  0 2  0 3  0 4  0 5                       item     a    b    c    d    e     df   df set index  item     How can I find the row of the dataframe df that has a non-numeric value in it    In this example it s the fourth row in the dataframe  which has the string  bad  in the a column  How can this row be found programmatically

User · Accepted Answer

You could use np isreal to check the type of each element  applymap applies a function to each element in the DataFrame    In  11   df applymap np isreal  Out 11             a     b item a      True  True b      True  True c      True  True d     False  True e      True  True   If all in the row are True then they are all numeric   In  12   df applymap np isreal  all 1  Out 12   item a        True b        True c        True d       False e        True dtype  bool   So to get the subDataFrame of rouges    Note  the negation      of the above finds the ones which have at least one rogue non-numeric    In  13   df  df applymap np isreal  all 1   Out 13           a    b item d     bad  0 4   You could also find the location of the first offender you could use argmin   In  14   np argmin df applymap np isreal  all 1   Out 14    d    As  CTZhu points out  it may be slightly faster to check whether it s an instance of either int or float  there is some additional overhead with np isreal    df applymap lambda x  isinstance x   int  float

User · Answer

Already some great answers to this question  however here is a nice snippet that I use regularly to drop rows if they have non-numeric values on some columns     Eliminate invalid data from dataframe  see Example below for more context   num df    df drop data columns  axis 1            join df data columns  apply pd to numeric  errors  coerce      num df   num df num df data columns  notnull   all axis 1     The way this works is we first drop all the data columns from the df  and then use a join to put them back in after passing them through pd to numeric  with option  coerce   such that all non-numeric entries are converted to NaN     The result is saved to num df     On the second line we use a filter that keeps only rows where all values are not null    Note that pd to numeric is coercing to NaN everything that cannot be converted to a numeric value  so strings that represent numeric values will not be removed  For example  1 25  will be recognized as the numeric value 1 25   Disclaimer  pd to numeric was introduced in pandas version 0 17 0  Example   In  1   import pandas as pd  In  2   df   pd DataFrame   item     a    b    c    d    e                                a    1 2 3  bad  5                               b   0 1 0 2 0 3 0 4 0 5     In  3   df Out 3         a    b item 0    1  0 1    a 1    2  0 2    b 2    3  0 3    c 3  bad  0 4    d 4    5  0 5    e  In  4   data columns     a    b    In  5   num df    df                    drop data columns  axis 1                     join df data columns  apply pd to numeric  errors  coerce      In  6   num df Out 6      item   a    b 0    a   1  0 1 1    b   2  0 2 2    c   3  0 3 3    d NaN  0 4 4    e   5  0 5  In  7   num df num df data columns  notnull   all axis 1   Out 7      item  a    b 0    a  1  0 1 1    b  2  0 2 2    c  3  0 3 4    e  5  0 5

User · Answer

I m thinking something like  just give an idea  to convert the column to string  and work with string is easier  however this does not work with strings containing numbers  like bad123  and   is taking the complement of selection   df  a     df  a   astype str  df  df  a   str contains  0 1 2 3 4 5 6 7 8 9    df  a     df  a   astype object    and using     join  str i  for i in range 10    to generate  0 1     8 9   or using np isreal   function  just like the most voted answer  df  df  a   apply lambda x  np isreal x

User · Answer

Sorry about the confusion  this should be the correct approach  Do you want only to capture  bad  only  not things like  good   Or just any non-numerical values   In 15   np where np any np isnan df convert objects convert numeric True    axis 1   Out 15    array  3

User · Answer

Original code df   pd DataFrame   a    1  2  3   bad   5                       b    0 1  0 2  0 3  0 4  0 5                       item     a    b    c    d    e     df   df set index  item     Convert to numeric using  coerce  which fills bad values with  nan   a   pd to numeric df a  errors  coerce     Use isna to return a boolean index   idx   a isna     Apply that index to the data frame   df idx    output  Returns the row with the bad data in it           a    b item           d     bad  0 4

User · Answer

In case you are working with a column with string values  you can use THE VERY USEFUL function series str isnumeric   like   a   pd Series   hi   hola   2 31   288   312   1312    0 21    0 23      What i do is to copy that column to new column  and do a str replace         and str replace         then i select the numeric values  and   a   a str replace         a   a str replace          a str isnumeric     Out 15    0    False 1    False 2     True 3     True 4     True 5     True 6     True 7     True dtype  bool  Good luck all

[python] Finding non-numeric rows in dataframe in pandas?

Examples related to python

Examples related to pandas

Examples related to dataframe