Convert Pandas column containing NaNs to dtype int

Question

I read data from a  csv file to a Pandas dataframe as below  For one of the columns  namely id  I want to specify the column type as int  The problem is the id series has missing empty values   When I try to cast the id column to integer while reading the  csv  I get   df  pd read csv  data csv   dtype   id   int    error  Integer column has NA values   Alternatively  I tried to convert the column type after reading as below  but this time I get   df  pd read csv  data csv    df   id      df   id    astype int  error  Cannot convert NA to integer   How can I tackle this

User · Answer

It is now possible to create a pandas column containing NaNs as dtype int, since it is now officially added on pandas 0.24.0

pandas 0.24.x release notes Quote: "Pandas has gained the ability to hold integer dtypes with missing values

User · Answer

import pandas as pd  df  pd read csv  data csv   df  id     pd to numeric df  id

User · Answer

In version 0 24   pandas has gained the ability to hold integer dtypes with missing values   Nullable Integer Data Type   Pandas can represent integer data with possibly missing values using arrays IntegerArray  This is an extension types implemented within pandas  It is not the default dtype for integers  and will not be inferred  you must explicitly pass the dtype into array   or Series   arr   pd array  1  2  np nan   dtype pd Int64Dtype    pd Series arr   0      1 1      2 2    NaN dtype  Int64   For convert column to nullable integers use   df  myCol     df  myCol   astype  Int64

User · Answer

If you absolutely want to combine integers and NaNs in a column  you can use the  object  data type   df  col           df  col   fillna 0       astype int       astype object       where df  col   notnull        This will replace NaNs with an integer  doesn t matter which   convert to int  convert to object and finally reinsert NaNs

User · Answer

The lack of NaN rep in integer columns is a pandas  gotcha    The usual workaround is to simply use floats

User · Answer

Assuming your DateColumn formatted 3312018 0 should be converted to 03 31 2018 as a string   And  some records are missing or 0   df  DateColumn     df  DateColumn   astype int  df  DateColumn     df  DateColumn   astype str  df  DateColumn     df  DateColumn   apply lambda x  x zfill 8   df loc df  DateColumn       00000000   DateColumn      01011980  df  DateColumn     pd to datetime df  DateColumn    format   m d Y   df  DateColumn     df  DateColumn   apply lambda x  x strftime   m  d  Y

User · Answer

If you want to use it when you chain methods  you can use assign  df          df assign col   lambda x  x  col   astype  Int64

User · Answer

If you can modify your stored data  use a sentinel value for missing id  A common use case  inferred by the column name   being that id is an integer  strictly greater than zero  you could use 0 as a sentinel value so that you can write  if row  id       regular process row  else     special process row

User · Answer

My use case is munging data prior to loading into a DB table   df col    df col  fillna -1  df col    df col  astype int  df col    df col  astype str  df col    df col  replace  -1   np nan    Remove NaNs  convert to int  convert to str and then reinsert NANs   It s not pretty but it gets the job done

User · Answer

I had the problem a few weeks ago with a few discrete features which were formatted as  object   This solution seemed to work  for col in discrete  df col    pd to numeric df col   errors  coerce   astype pd Int64Dtype

User · Answer

Most solutions here tell you how to use a placeholder integer to represent nulls  That approach isn t helpful if you re uncertain that integer won t show up in your source data though  My method with will format floats without their decimal values and convert nulls to None s  The result is an object datatype that will look like an integer field with null values when loaded into a CSV   keep df col    keep df col  apply lambda x  None if pandas isnull x  else   0  0f   format pandas to numeric x

User · Answer

use pd to numeric   df  quot DateColumn quot     pd to numeric df  quot DateColumn quot     simple and clean

User · Answer

I ran into this issue working with pyspark  As this is a python frontend for code running on a jvm  it requires type safety and using float instead of int is not an option  I worked around the issue by wrapping the pandas pd read csv in a function that will fill user-defined columns with user-defined fill values before casting them to the required type  Here is what I ended up using   def custom read csv file path  custom dtype   None  fill values   None    kwargs       if custom dtype is None          return pd read csv file path    kwargs      else          assert  dtype  not in kwargs keys           df   pd read csv file path  dtype         kwargs          for col  typ in custom dtype items                if fill values is None or col not in fill values keys                    fill val   -1             else                  fill val   fill values col              df col    df col  fillna fill val  astype typ      return df

User · Answer

First remove the rows which contain NaN  Then do Integer conversion on remaining rows  At Last insert the removed rows again  Hope it will work

User · Answer

Try this  df   id      df   id    astype pd Int64Dtype    If you print it s  dtypes  you will get  id       Int64 instead of normal  one       int64

User · Answer

You could use  dropna   if it is OK to drop the rows with the NaN values   df   df dropna subset   id      Alternatively  use  fillna   and  astype   to replace the NaN with values and convert them to int   I ran into this problem when processing a CSV file with large integers  while some of them were missing  NaN   Using float as the type was not an option  because I might loose the precision   My solution was to use str as the intermediate type   Then you can convert the string to int as you please later in the code  I replaced NaN with 0  but you could choose any value   df   pd read csv filename  dtype   id  str   df  id     df  id   fillna  0   astype int    For the illustration  here is an example how floats may loose the precision   s    12345678901234567890  f   float s  i   int f  i2   int s  print  f  i  i2    And the output is   1 2345678901234567e 19 12345678901234567168 12345678901234567890

User · Answer

As of Pandas 1 0 0 you can now use pandas NA values  This does not force integer columns with missing values to be floats  When reading in your data all you have to do is  df  pd read csv  quot data csv quot   dtype   id    Int64       Notice the  Int64  is surrounded by quotes and the I is capitalized  This distinguishes Panda s  Int64  from numpy s int64  As a side note  this will also work with  astype   df  id     df  id   astype  Int64    Documentation here https   pandas pydata org pandas-docs stable user guide integer na html

[python] Convert Pandas column containing NaNs to dtype `int`

Examples related to python

Examples related to pandas

Examples related to na