Drop rows containing empty cells from a pandas DataFrame

Question

I have a pd DataFrame that was created by parsing some excel spreadsheets  A column of which has empty cells  For example  below is the output for the frequency of that column  32320 records have missing values for Tenant    gt  gt  gt  value counts Tenant  normalize False                                32320     Thunderhead                8170     Big Data Others            5700     Cloud Cruiser              5700     Partnerpedia               5700     Comcast                    5700     SDP                        5700     Agora                      5700     dtype  int64   I am trying to drop rows where Tenant is missing  however  isnull   option does not recognize the missing values     gt  gt  gt  df  Tenant   isnull   sum       0   The column has data type  Object   What is happening in this case  How can I drop records where Tenant is missing

User · Answer

Pythonic   Pandorable  df df  col   astype bool   Empty strings are falsy  which means you can filter on bool values like this  df   pd DataFrame        A   range 5        B     foo        bar        xyz      df    A    B 0  0  foo 1  1      2  2  bar 3  3      4  4  xyz   df  B   astype bool                                                                                                                        0     True 1    False 2     True 3    False 4     True Name  B  dtype  bool  df df  B   astype bool                                                                                                                        A    B 0  0  foo 2  2  bar 4  4  xyz  If your goal is to remove not only empty strings  but also strings only containing whitespace  use str strip beforehand  df df  B   str strip   astype bool      A    B 0  0  foo 2  2  bar 4  4  xyz  Faster than you Think  astype is a vectorised operation  this is faster than every option presented thus far  At least  from my tests  YMMV  Here is a timing comparison  I ve thrown in some other methods I could think of   Benchmarking code  for reference  import pandas as pd import perfplot  df1   pd DataFrame        A   range 5        B     foo        bar        xyz       perfplot show      setup lambda n  pd concat  df1    n  ignore index True       kernels           lambda df  df df  B   astype bool            lambda df  df df  B                   lambda df  df df  B   replace     np nan  notna        optimized 1-col         lambda df  df replace   B        np nan    dropna subset   B                  labels   astype    quot       quot    quot replace   notna quot    quot replace   dropna quot          n range  2  k for k in range 1  15        xlabel  N       logx True      logy True      equality check pd DataFrame equals

User · Answer

There s a situation where the cell has white space  you can t see it  use   df  col   replace       np nan  inplace True    to replace white space as NaN  then   df  df dropna subset   col

User · Answer

Pandas will recognise a value as null if it is a np nan object  which will print as NaN in the DataFrame  Your missing values are probably empty strings  which Pandas doesn t recognise as null  To fix this  you can convert the empty stings  or whatever is in your empty cells  to np nan objects using replace    and then call dropna  on your DataFrame to delete rows with null tenants   To demonstrate  we create a DataFrame with some random values and some empty strings in a Tenants column    gt  gt  gt  import pandas as pd  gt  gt  gt  import numpy as np  gt  gt  gt    gt  gt  gt  df   pd DataFrame np random randn 10  2   columns list  AB     gt  gt  gt  df  Tenant     np random choice   Babar    Rataxes        10   gt  gt  gt  print df            A         B   Tenant 0 -0 588412 -1 179306    Babar 1 -0 008562  0 725239          2  0 282146  0 421721  Rataxes 3  0 627611 -0 661126    Babar 4  0 805304 -0 834214          5 -0 514568  1 890647    Babar 6 -1 188436  0 294792  Rataxes 7  1 471766 -0 267807    Babar 8 -1 730745  1 358165  Rataxes 9  0 066946  0 375640            Now we replace any empty strings in the Tenants column with np nan objects  like so    gt  gt  gt  df  Tenant   replace     np nan  inplace True   gt  gt  gt  print df            A         B   Tenant 0 -0 588412 -1 179306    Babar 1 -0 008562  0 725239      NaN 2  0 282146  0 421721  Rataxes 3  0 627611 -0 661126    Babar 4  0 805304 -0 834214      NaN 5 -0 514568  1 890647    Babar 6 -1 188436  0 294792  Rataxes 7  1 471766 -0 267807    Babar 8 -1 730745  1 358165  Rataxes 9  0 066946  0 375640      NaN   Now we can drop the null values    gt  gt  gt  df dropna subset   Tenant    inplace True   gt  gt  gt  print df            A         B   Tenant 0 -0 588412 -1 179306    Babar 2  0 282146  0 421721  Rataxes 3  0 627611 -0 661126    Babar 5 -0 514568  1 890647    Babar 6 -1 188436  0 294792  Rataxes 7  1 471766 -0 267807    Babar 8 -1 730745  1 358165  Rataxes

User · Answer

value counts omits NaN by default so you re most likely dealing with      So you can just filter them out like  filter   df  Tenant         dfNew   df filter

User · Answer

If you don t care about the columns where the missing files are  considering that the dataframe has the name New and one wants to assign the new dataframe to the same variable  simply run New   New drop duplicates    If you specifically want to remove the rows for the empty values in the column Tenant this will do the work New   New New Tenant         This may also be used for removing rows with a specific value - just change the string to the value that one wants  Note  If instead of an empty string one has NaN  then New   New dropna subset   Tenant

User · Answer

You can use this variation   import pandas as pd vals          name      n1    n2    n3    n4    n5    n6    n7         gender      m    f    f    f     f    c    c         age     39  12  27  13  36  29  10        education      ma   None   school   None   ba   None  None    df vals   pd DataFrame vals   converting dict to dataframe   This will output    - highlighting only desired rows       age education gender name 0   39        ma      m   n1    1   12      None      f   n2     2   27    school      f   n3    3   13      None      f   n4 4   36        ba      f   n5    5   29      None      c   n6 6   10      None      c   n7   So to drop everything that does not have an  education  value  use the code below   df vals   df vals  df vals  education   isnull            indicating NOT   Result      age education gender name 0   39        ma      m   n1 2   27    school      f   n3 4   36        ba      f   n5

[python] Drop rows containing empty cells from a pandas DataFrame

Pythonic + Pandorable: `df[df['col'].astype(bool)]`

Faster than you Think

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to drop