Pandas convert dtype object to int

Question

I ve read an SQL query into Pandas and the values are coming in as dtype  object   although they are strings  dates and integers  I am able to convert the date  object  to a Pandas datetime dtype  but I m getting an error when trying to convert the string and integers   Here is an example    gt  gt  gt  import pandas as pd  gt  gt  gt  df   pd read sql query  select   from my table   conn   gt  gt  gt  df     id    date          purchase  1  abc1  2016-05-22    1  2  abc2  2016-05-29    0  3  abc3  2016-05-22    2  4  abc4  2016-05-22    0   gt  gt  gt  df dtypes  id          object  date        object  purchase    object  dtype  object   Converting the df  date   to a datetime works    gt  gt  gt  pd to datetime df  date     1  2016-05-22  2  2016-05-29  3  2016-05-22  4  2016-05-22  Name  date  dtype  datetime64 ns     But I get an error when trying to convert the df  purchase   to an integer    gt  gt  gt  df  purchase   astype int         pandas lib pyx in pandas lib astype intsafe  pandas lib c 16667     pandas src util pxd in util set value at  pandas lib c 67540      TypeError  long   argument must be a string or a number  not  java lang Long    NOTE  I get a similar error when I tried  astype  float    And when trying to convert to a string  nothing seems to happen    gt  gt  gt  df  id   apply str   1 abc1  2 abc2  3 abc3  4 abc4  Name  id  dtype  object

User · Answer

In my case  I had a df with mixed data   df                       0   1   2                         242                  243                  244 0   2020-04-22T04 00 00Z   0   0               3 094 409 5         13 220 425 7          5 449 201 1 1   2020-04-22T06 00 00Z   0   0               3 716 941 5          8 452 012 9          6 541 599 9        The floats are actually objects  but I need them to be real floats   To fix it  referencing  AMC s comment above   def coerce to float val       try         return float val      except ValueError         return val  df   df applymap lambda x  coerce to float x

User · Answer

Follow these steps   1 clean your file -  open your datafile in csv format and see that there is     in place of empty places and delete all of them   2 drop the rows containing missing values e g    df dropna subset   normalized-losses    axis   0   inplace  True    3 use astype now for conversion  df  normalized-losses   df  normalized-losses   astype int    Note  If still finding erros in your program then again inspect your csv file  open it in excel to find whether is there an     in your required column  then delete it and save file and go back and run your program   comment success  if it works

User · Answer

My train data contains three features are object after applying astype it converts the object into numeric but before that  you need to perform some preprocessing steps   train dtypes  C12       object C13       object C14       Object  train  C14     train C14 astype int   train dtypes  C12       object C13       object C14       int32

User · Answer

pandas  gt   1 0 convert dtypes The  self  accepted answer doesn t take into consideration the possibility of NaNs in object columns  df   pd DataFrame         a    1  2  np nan          b    True  False  np nan    dtype object   df                                                                                a      b 0    1   True 1    2  False 2  NaN    NaN  df  a   astype str  astype int    raises ValueError  This chokes because the NaN is converted to a string  quot nan quot   and further attempts to coerce to integer will fail  To avoid this issue  we can soft-convert columns to their corresponding nullable type using convert dtypes  df convert dtypes                                                                  a      b 0     1   True 1     2  False 2   lt NA gt     lt NA gt   df convert dtypes   dtypes                                                   a      Int64 b    boolean dtype  object  If your data has junk text mixed in with your ints  you can use pd to numeric as an initial step  s   pd Series   1    2           s convert dtypes      converts to string  which is not what we want  0      1 1      2 2        dtype  string     coerces non-numeric junk to NaNs pd to numeric s  errors  coerce    0    1 0 1    2 0 2    NaN dtype  float64    one final  convert dtypes  call to convert to nullable int pd to numeric s  errors  coerce   convert dtypes     0       1 1       2 2     lt NA gt  dtype  Int64

User · Answer

Cannot comment so posting this as an answer  which is somewhat in between  piRSquared  cyril s solution and  cs95 s   As noted by  cs95  if your data contains NaNs or Nones  converting to string type will throw an error when trying to convert to int afterwards   However  if your data consists of  numerical  strings  using convert dtypes will convert it to string type unless you use pd to numeric as suggested by  cs95  potentially combined with df apply      In the case that your data consists only of numerical strings  including NaNs or Nones but without any non-numeric  junk    a possibly simpler alternative would be to convert first to float and then to one of the nullable-integer extension dtypes provided by pandas  already present in version 0 24   see also this answer    df  purchase   astype float  astype  Int64     Note that there has been recent discussion on this on github  currently an -unresolved- closed issue though  and that in the case of very long 64-bit integers you may have to convert explicitly to float128 to avoid approximations during the conversions

User · Answer

It s simple  pd factorize df purchase  0    Example   labels  uniques   pd factorize   b    b    a    c    b       labels   array  0  0  1  2  0     uniques   array   b    a    c    dtype object

User · Answer

This was my data     list of columns  l1     PM2 5    PM10    TEMP    BP     RH    WS   CO    O3    Nox    SO2     for i in l1   for j in range 0  8431    rows   8431    df i  j    int df i  j     I recommend you to use this only with small data  This code has complexity of O n 2

User · Answer

Documenting the answer that worked for me based on the comment by  piRSquared    I needed to convert to a string first  then an integer    gt  gt  gt  df  purchase   astype str  astype int

[python] Pandas: convert dtype 'object' to int

Examples related to python

Examples related to pandas

Examples related to numpy