Change column type in pandas

Question

I want to convert a table  represented as a list of lists  into a Pandas DataFrame  As an extremely simplified example   a      a    1 2    4 2      b    70    0 03      x    5    0    df   pd DataFrame a    What is the best way to convert the columns to the appropriate types  in this case columns 2 and 3 into floats  Is there a way to specify the types while converting to DataFrame  Or is it better to create the DataFrame first and then loop through the columns to change the type for each column  Ideally I would like to do this in a dynamic way because there can be hundreds of columns and I don t want to specify exactly which columns are of which type  All I can guarantee is that each columns contains values of the same type

User · Answer

Here is a function that takes as its arguments a DataFrame and a list of columns and coerces all data in the columns to numbers.

# df is the DataFrame, and column_list is a list of columns as strings (e.g ["col1","col2","col3"])
# dependencies: pandas

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

So, for your example:

import pandas as pd

def coerce_df_columns_to_numeric(df, column_list):
    df[column_list] = df[column_list].apply(pd.to_numeric, errors='coerce')

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col1','col2','col3'])

coerce_df_columns_to_numeric(df, ['col2','col3'])

User · Answer

You have four main options for converting types in pandas   to numeric   - provides functionality to safely convert non-numeric types  e g  strings  to a suitable numeric type   See also to datetime   and to timedelta      astype   - convert  almost  any type to  almost  any other type  even if it s not necessarily sensible to do so   Also allows you to convert to categorial types  very useful    infer objects   - a utility method to convert object columns holding Python objects to a pandas type if possible   convert dtypes   - convert DataFrame columns to the  quot best possible quot   dtype that supports pd NA  pandas  object to indicate a missing value     Read on for more detailed explanations and usage of each of these methods   1  to numeric   The best way to convert one or more columns of a DataFrame to numeric values is to use pandas to numeric    This function will try to change non-numeric objects  such as strings  into integers or floating point numbers as appropriate  Basic usage The input to to numeric   is a Series or a single column of a DataFrame   gt  gt  gt  s   pd Series   quot 8 quot   6   quot 7 5 quot   3   quot 0 9 quot      mixed string and numeric values  gt  gt  gt  s 0      8 1      6 2    7 5 3      3 4    0 9 dtype  object   gt  gt  gt  pd to numeric s    convert everything to float values 0    8 0 1    6 0 2    7 5 3    3 0 4    0 9 dtype  float64  As you can see  a new Series is returned  Remember to assign this output to a variable or column name to continue using it    convert Series my series   pd to numeric my series     convert column  quot a quot  of a DataFrame df  quot a quot     pd to numeric df  quot a quot     You can also use it to convert multiple columns of a DataFrame via the apply   method    convert all columns of DataFrame df   df apply pd to numeric    convert all columns of DataFrame    convert just columns  quot a quot  and  quot b quot  df   quot a quot    quot b quot      df   quot a quot    quot b quot    apply pd to numeric   As long as your values can all be converted  that s probably all you need  Error handling But what if some values can t be converted to a numeric type  to numeric   also takes an errors keyword argument that allows you to force non-numeric values to be NaN  or simply ignore columns containing these values  Here s an example using a Series of strings s which has the object dtype   gt  gt  gt  s   pd Series   1    2    4 7    pandas    10     gt  gt  gt  s 0         1 1         2 2       4 7 3    pandas 4        10 dtype  object  The default behaviour is to raise if it can t convert a value  In this case  it can t cope with the string  pandas    gt  gt  gt  pd to numeric s    or pd to numeric s  errors  raise   ValueError  Unable to parse string  Rather than fail  we might want  pandas  to be considered a missing bad numeric value  We can coerce invalid values to NaN as follows using the errors keyword argument   gt  gt  gt  pd to numeric s  errors  coerce   0     1 0 1     2 0 2     4 7 3     NaN 4    10 0 dtype  float64  The third option for errors is just to ignore the operation if an invalid value is encountered   gt  gt  gt  pd to numeric s  errors  ignore     the original Series is returned untouched  This last option is particularly useful when you want to convert your entire DataFrame  but don t not know which of our columns can be converted reliably to a numeric type  In that case just write  df apply pd to numeric  errors  ignore    The function will be applied to each column of the DataFrame  Columns that can be converted to a numeric type will be converted  while columns that cannot  e g  they contain non-digit strings or dates  will be left alone  Downcasting By default  conversion with to numeric   will give you either a int64 or float64 dtype  or whatever integer width is native to your platform   That s usually what you want  but what if you wanted to save some memory and use a more compact dtype  like float32  or int8  to numeric   gives you the option to downcast to either  integer    signed    unsigned    float   Here s an example for a simple series s of integer type   gt  gt  gt  s   pd Series  1  2  -7    gt  gt  gt  s 0    1 1    2 2   -7 dtype  int64  Downcasting to  integer  uses the smallest possible integer that can hold the values   gt  gt  gt  pd to numeric s  downcast  integer   0    1 1    2 2   -7 dtype  int8  Downcasting to  float  similarly picks a smaller than normal floating type   gt  gt  gt  pd to numeric s  downcast  float   0    1 0 1    2 0 2   -7 0 dtype  float32   2  astype   The astype   method enables you to be explicit about the dtype you want your DataFrame or Series to have  It s very versatile in that you can try and go from one type to the any other  Basic usage Just pick a type  you can use a NumPy dtype  e g  np int16   some Python types  e g  bool   or pandas-specific types  like the categorical dtype   Call the method on the object you want to convert and astype   will try and convert it for you    convert all DataFrame columns to the int64 dtype df   df astype int     convert column  quot a quot  to int64 dtype and  quot b quot  to complex type df   df astype   quot a quot   int   quot b quot   complex      convert Series to float16 type s   s astype np float16     convert Series to Python strings s   s astype str     convert Series to categorical type - see docs for more details s   s astype  category    Notice I said  quot try quot  - if astype   does not know how to convert a value in the Series or DataFrame  it will raise an error  For example if you have a NaN or inf value you ll get an error trying to convert it to an integer  As of pandas 0 20 0  this error can be suppressed by passing errors  ignore   Your original object will be return untouched  Be careful astype   is powerful  but it will sometimes convert values  quot incorrectly quot   For example   gt  gt  gt  s   pd Series  1  2  -7    gt  gt  gt  s 0    1 1    2 2   -7 dtype  int64  These are small integers  so how about converting to an unsigned 8-bit type to save memory   gt  gt  gt  s astype np uint8  0      1 1      2 2    249 dtype  uint8  The conversion worked  but the -7 was wrapped round to become 249  i e  28 - 7   Trying to downcast using pd to numeric s  downcast  unsigned   instead could help prevent this error   3  infer objects   Version 0 21 0 of pandas introduced the method infer objects   for converting columns of a DataFrame that have an object datatype to a more specific type  soft conversions   For example  here s a DataFrame with two columns of object type  One holds actual integers and the other holds strings representing integers   gt  gt  gt  df   pd DataFrame   a    7  1  5    b     3   2   1     dtype  object    gt  gt  gt  df dtypes a    object b    object dtype  object  Using infer objects    you can change the type of column  a  to int64   gt  gt  gt  df   df infer objects    gt  gt  gt  df dtypes a     int64 b    object dtype  object  Column  b  has been left alone since its values were strings  not integers  If you wanted to try and force the conversion of both columns to an integer type  you could use df astype int  instead   4  convert dtypes   Version 1 0 and above includes a method convert dtypes   to convert Series and DataFrame columns to the best possible dtype that supports the pd NA missing value  Here  quot best possible quot  means the type most suited to hold the values  For example  this a pandas integer type if all of the values are integers  or missing values   an object column of Python integer objects is converted to Int64  a column of NumPy int32 values will become the pandas dtype Int32  With our object DataFrame df  we get the following result   gt  gt  gt  df convert dtypes   dtypes                                              a     Int64 b    string dtype  object  Since column  a  held integer values  it was converted to the Int64 type  which is capable of holding missing values  unlike int64   Column  b  contained string objects  so was changed to pandas  string dtype  By default  this method will infer the type from object values in each column  We can change this by passing infer objects False   gt  gt  gt  df convert dtypes infer objects False  dtypes                           a    object b    string dtype  object  Now column  a  remained an object column  pandas knows it can be described as an  integer  column  internally it ran infer dtype  but didn t infer exactly what dtype of integer it should have so did not convert it  Column  b  was again converted to  string  dtype as it was recognised as holding  string  values

User · Answer

pandas    1 0  Here s a chart that summarises some of the most important conversions in pandas      Conversions to string are trivial  astype str  and are not shown in the figure    Hard  versus  Soft  conversions  Note that  conversions  in this context could either refer to converting text data into their actual data type  hard conversion   or inferring more appropriate data types for data in object columns  soft conversion   To illustrate the difference  take a look at   df   pd DataFrame   a     1    2    3     b    4  5  6    dtype object  df dtypes                                                                    a    object b    object dtype  object    Actually converts string to numeric - hard conversion df apply pd to numeric  dtypes                                               a    int64 b    int64 dtype  object    Infers better data types for object data - soft conversion df infer objects   dtypes                                                    a    object    no change b     int64 dtype  object    Same as infer objects  but converts to equivalent ExtensionType df convert dtypes   dtypes

User · Answer

Starting pandas 1 0 0  we have pandas DataFrame convert dtypes  You can even control what types to convert   In  40   df   pd DataFrame                                   a   pd Series  1  2  3   dtype np dtype  int32                       b   pd Series   x    y    z    dtype np dtype  O                       c   pd Series  True  False  np nan   dtype np dtype  O                       d   pd Series   h    i   np nan   dtype np dtype  O                       e   pd Series  10  np nan  20   dtype np dtype  float                       f   pd Series  np nan  100 5  200   dtype np dtype  float                                In  41   dff   df copy    In  42   df  Out 42       a  b      c    d     e      f 0  1  x   True    h  10 0    NaN 1  2  y  False    i   NaN  100 5 2  3  z    NaN  NaN  20 0  200 0  In  43   df dtypes Out 43    a      int32 b     object c     object d     object e    float64 f    float64 dtype  object  In  44   df   df convert dtypes    In  45   df dtypes Out 45    a      Int32 b     string c    boolean d     string e      Int64 f    float64 dtype  object  In  46   dff   dff convert dtypes convert boolean   False   In  47   dff dtypes Out 47    a      Int32 b     string c     object d     string e      Int64 f    float64 dtype  object

User · Answer

I thought I had the same problem but actually I have a slight difference that makes the problem easier to solve  For others looking at this question it s worth checking the format of your input list  In my case the numbers are initially floats not strings as in the question   a      a   1 2  4 2     b   70  0 03     x   5  0     but by processing the list too much before creating the dataframe I lose the types and everything becomes a string    Creating the data frame via a numpy array  df   pd DataFrame np array a    df Out 5       0    1     2 0  a  1 2   4 2 1  b   70  0 03 2  x    5     0  df 1  dtype Out 7   dtype  O     gives the same data frame as in the question  where the entries in columns 1 and 2 are considered as strings  However doing  df   pd DataFrame a   df Out 10       0     1     2 0  a   1 2  4 20 1  b  70 0  0 03 2  x   5 0  0 00  df 1  dtype Out 11   dtype  float64     does actually give a data frame with the columns in the correct format

User · Answer

When I ve only needed to specify specific columns  and I want to be explicit  I ve used  per DOCS LOCATION    dataframe   dataframe astype   col name 1   int   col name 2   float64   etc          So  using the original question  but providing column names to it      a      a    1 2    4 2      b    70    0 03      x    5    0    df   pd DataFrame a  columns   col name 1    col name 2    col name 3    df   df astype   col name 2   float64    col name 3   float64

User · Answer

How about this    a      a    1 2    4 2      b    70    0 03      x    5    0    df   pd DataFrame a  columns   one    two    three    df Out 16      one  two three 0   a  1 2   4 2 1   b   70  0 03 2   x    5     0  df dtypes Out 17    one      object two      object three    object  df   two    three      df   two    three    astype float   df dtypes Out 19    one       object two      float64 three    float64

User · Answer

this below code will change datatype of column   df   col name1    col name2         df   col name1    col name2      astype  data type     in place of data type you can give your datatype  what do you want like str float int etc

User · Answer

How about creating two dataframes  each with different data types for their columns  and then appending them together   d1   pd DataFrame columns    float column     dtype float  d1   d1 append pd DataFrame columns    string column     dtype str     Results  In 8    d1 dtypes Out 8    float column     float64 string column     object dtype  object   After the dataframe is created  you can populate it with floating point variables in the 1st column  and strings  or any data type you desire  in the 2nd column

[python] Change column type in pandas

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to types

Examples related to casting