Replace invalid values with None in Pandas DataFrame

Question

Is there any method to replace values with None in Pandas in Python   You can use df replace  pre    post   and can replace a value with another  but this can t be done if you want to replace with None value  which if you try  you get a strange result   So here s an example   df   DataFrame   -  3 2 5 1 -5 -1  -  9   df replace  -   0    which returns a successful result   But   df replace  -   None    which returns a following result   0 0   -    this isn t replaced 1   3 2   2 3   5 4   1 5  -5 6  -1 7  -1    this is changed to  -1     8   9   Why does such a strange result be returned   Since I want to pour this data frame into MySQL database  I can t put NaN values into any element in my data frame and instead want to put None  Surely  you can first change  -  to NaN and then convert NaN to None  but I want to know why the dataframe acts in such a terrible way      Tested on pandas 0 12 0 dev on Python 2 7 and OS X 10 8  Python is a   pre-installed version on OS X and I installed pandas by using SciPy   Superpack script  for your information

User · Answer

Using replace and assigning a new df    import pandas as pd df   pd DataFrame   -  3 2 5 1 -5 -1  -  9   dfnew   df replace  -   0  print dfnew     venv  D  assets gt py teste2 py    0 0  0 1  3 2  2 3  5 4  1 5 -5

User · Answer

df   pd DataFrame   -  3 2 5 1 -5 -1  -  9   df   df where df   -   None

User · Answer

I prefer the solution using replace with a dict because of its simplicity and elegance   df replace   -   None     You can also have more replacements   df replace   -   None   None   None     And even for larger replacements  it is always obvious and clear what is replaced by what - which is way harder for long lists  in my opinion

User · Answer

Setting null values can be done with np nan   import numpy as np df replace  -   np nan    Advantage is that df last valid index   recognizes these as invalid

User · Answer

Actually in later versions of pandas this will give a TypeError   df replace  -   None  TypeError  If  to replace  and  value  are both None then regex must be a mapping   You can do it by passing either a list or a dictionary   In  11   df replace  -   df replace   -     None     or  replace  -    0  None   Out 11         0 0  None 1     3 2     2 3     5 4     1 5    -5 6    -1 7  None 8     9   But I recommend using NaNs rather than None   In  12   df replace  -   np nan  Out 12        0 0  NaN 1    3 2    2 3    5 4    1 5   -5 6   -1 7  NaN 8    9

User · Answer

df replace  -   np nan  astype  object     This will ensure that you can use isnull   later on your dataframe

User · Answer

Before proceeding with this post  it is important to understand the difference between NaN and None  One is a float type  the other is an object type  Pandas is better suited to working with scalar types as many methods on these types can be vectorised  Pandas does try to handle None and NaN consistently  but NumPy cannot    My suggestion  and Andy s  is to stick with NaN    But to answer your question     pandas    0 18  Use na values   -   argument with read csv  If you loaded this data from CSV Excel  I have good news for you  You can quash this at the root during data loading instead of having to write a fix with code as a subsequent step   Most of the pd read   functions  such as read csv and read excel  accept a na values attribute   file csv  A B - 1 3 - 2 - 5 3 1 -2 -5 4 -1 -1 - 0 9 0   Now  to convert the - characters into NaNs  do   import pandas as pd df   pd read csv  file csv   na values   -    df       A    B 0  NaN  1 0 1  3 0  NaN 2  2 0  NaN 3  5 0  3 0 4  1 0 -2 0 5 -5 0  4 0 6 -1 0 -1 0 7  NaN  0 0 8  9 0  0 0   And similar for other functions file formats   P S   On v0 24   you can preserve integer type even if your column has NaNs  yes  talk about having the cake and eating it too   You can specify dtype  Int32   df   pd read csv  file csv   na values   -    dtype  Int32   df       A    B 0  NaN    1 1    3  NaN 2    2  NaN 3    5    3 4    1   -2 5   -5    4 6   -1   -1 7  NaN    0 8    9    0  df dtypes  A    Int32 B    Int32 dtype  object   The dtype is not a conventional int type    but rather  a Nullable Integer Type  There are other options     Handling Numeric Data  pd to numeric with errors  coerce  If you re dealing with numeric data  a faster solution is to use pd to numeric with the errors  coerce  argument  which coerces invalid values  values that cannot be cast to numeric  to NaN    pd to numeric df  A    errors  coerce    0    NaN 1    3 0 2    2 0 3    5 0 4    1 0 5   -5 0 6   -1 0 7    NaN 8    9 0 Name  A  dtype  float64   To retain  nullable  integer dtype  use  pd to numeric df  A    errors  coerce   astype  Int32    0    NaN 1      3 2      2 3      5 4      1 5     -5 6     -1 7    NaN 8      9 Name  A  dtype  Int32    To coerce multiple columns  use apply   df   A    B    apply pd to numeric  errors  coerce   astype  Int32         A    B 0  NaN    1 1    3  NaN 2    2  NaN 3    5    3 4    1   -2 5   -5    4 6   -1   -1 7  NaN    0 8    9    0      and assign the result back after   More information can be found in this answer

User · Answer

With Pandas version  1 0 0  I would use DataFrame replace or Series replace  df replace old val  pd NA  inplace True   This is better for two reasons   It uses pd NA instead of None or np nan  It replaces the value in-place which could be more memory efficient

User · Answer

where is probably what you re looking for  So  data data where data   -   None     From the panda docs       where  returns  an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other

[python] Replace invalid values with None in Pandas DataFrame

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to replace

Examples related to nan