Replacing blank values white space with NaN in pandas

Question

I want to find all values in a Pandas dataframe that contain whitespace  any arbitrary amount  and replace those values with NaNs   Any ideas how this can be improved   Basically I want to turn this                      A    B    C 2000-01-01 -0 532681  foo    0 2000-01-02  1 490752  bar    1 2000-01-03 -1 387326  foo    2 2000-01-04  0 814772  baz      2000-01-05 -0 222552         4 2000-01-06 -1 176781  qux        Into this                      A     B     C 2000-01-01 -0 532681   foo     0 2000-01-02  1 490752   bar     1 2000-01-03 -1 387326   foo     2 2000-01-04  0 814772   baz   NaN 2000-01-05 -0 222552   NaN     4 2000-01-06 -1 176781   qux   NaN   I ve managed to do it with the code below  but man is it ugly  It s not Pythonic and I m sure it s not the most efficient use of pandas either  I loop through each column and do boolean replacement against a column mask generated by applying a function that does a regex search of each value  matching on whitespace   for i in df columns      df i  df i  apply lambda i  True if re search    s     str i   else False   None   It could be optimized a bit by only iterating through fields that could contain empty strings   if df i  dtype    np dtype  object     But that s not much of an improvement  And finally  this code sets the target strings to None  which works with Pandas  functions like fillna    but it would be nice for completeness if I could actually insert a NaN directly instead of None

User · Answer

For a very fast and simple solution where you check equality against a single value  you can use the mask method   df mask df

User · Answer

If you are exporting the data from the CSV file it can be as simple as this    df   pd read csv file csv  na values        This will create the data frame as well as replace blank values as Na

User · Answer

print df isnull   sum      check numbers of null value in each column  modifiedDf df fillna  NaN     Replace empty null values with  NaN     modifiedDf   fd dropna     Remove rows with empty values  print modifiedDf isnull   sum      check numbers of null value in each column

User · Answer

These are all close to the right answer  but I wouldn t say any solve the problem while remaining most readable to others reading your code  I d say that answer is a combination of BrenBarn s Answer and tuomasttik s comment below that answer  BrenBarn s answer utilizes isspace builtin  but does not support removing empty strings  as OP requested  and  I would tend to attribute that as the standard use case of replacing strings with null   I rewrote it with  apply  so you can call it on a pd Series or pd DataFrame     Python 3    To replace empty strings or strings of entirely spaces   df   df apply lambda x  np nan if isinstance x  str  and  x isspace   or not x  else x    To replace strings of entirely spaces   df   df apply lambda x  np nan if isinstance x  str  and x isspace   else x      To use this in Python 2  you ll need to replace str with basestring   Python 2    To replace empty strings or strings of entirely spaces   df   df apply lambda x  np nan if isinstance x  basestring  and  x isspace   or not x  else x    To replace strings of entirely spaces   df   df apply lambda x  np nan if isinstance x  basestring  and x isspace   else x

User · Answer

Simplest of all solutions   df   df replace r   s     np nan  regex True

User · Answer

If you want to replace an empty string and records with only spaces  the correct answer is    df   df replace r   s     np nan  regex True    The accepted answer  df replace r  s    np nan  regex True    Does not replace an empty string   you can try yourself with the given example slightly updated   df   pd DataFrame        -0 532681   foo   0        1 490752   bar   1        -1 387326   fo o   2        0 814772   baz                   -0 222552         4        -1 176781    qux                    columns  A B C  split    index pd date range  2000-01-01   2000-01-06      Note  also that  fo o  is not replaced with Nan  though it contains a space  Further note  that a simple   df replace r    np NaN    Does not work either - try it out

User · Answer

I think df replace   does the job  since pandas 0 13   df   pd DataFrame        -0 532681   foo   0        1 490752   bar   1        -1 387326   foo   2        0 814772   baz                   -0 222552         4        -1 176781    qux                      columns  A B C  split    index pd date range  2000-01-01   2000-01-06       replace field that s entirely space  or empty  with NaN print df replace r   s     np nan  regex True     Produces                      A    B   C 2000-01-01 -0 532681  foo   0 2000-01-02  1 490752  bar   1 2000-01-03 -1 387326  foo   2 2000-01-04  0 814772  baz NaN 2000-01-05 -0 222552  NaN   4 2000-01-06 -1 176781  qux NaN     As Temak pointed it out  use df replace r   s     np nan  regex True  in case your valid data contains white spaces

User · Answer

This should work df loc df Variable         Variable      Value   or df loc df Variable1         Variable2      Value

User · Answer

How about   d   d applymap lambda x  np nan if isinstance x  basestring  and x isspace   else x    The applymap function applies a function to every cell of the dataframe

User · Answer

you can also use a filter to do it   df   PD DataFrame        -0 532681   foo   0        1 490752   bar   1        -1 387326   foo   2        0 814772   baz                   -0 222552         4        -1 176781    qux              df df       nan      df df astype float

User · Answer

This worked for me  When I import my csv file I added na values        Spaces are not included in the default NaN values  df  pd read csv filepath na values

User · Answer

I will did this   df   df apply lambda x  x str strip    replace     np nan    or  df   df apply lambda x  x str strip   if isinstance x  str  else x  replace     np nan    You can strip all str  then replace empty str with np nan

User · Answer

This is not an elegant solution  but what does seem to work is saving to XLSX and then importing it back  The other solutions on this page did not work for me  unsure why   data to excel filepath  index False  data   pd read excel filepath

[python] Replacing blank values (white space) with NaN in pandas

Examples related to python

Examples related to pandas

Examples related to dataframe