Assign pandas dataframe column dtypes

Question

I want to set the dtypes of multiple columns in pd Dataframe  I have a file that I ve had to manually parse into a list of lists  as the file was not amenable for pd read csv   import pandas as pd print pd DataFrame    a   1     b   2                        dtype   x   object   y   int                       columns   x   y      I get  ValueError  entry not a 2- or 3- tuple   The only way I can set them is by looping through each column variable and recasting with astype    dtypes     x   object   y   int   mydata   pd DataFrame    a   1     b   2                           columns   x   y    for c in mydata columns      mydata c    mydata c  astype dtypes c   print mydata  y   dtype      gt  int64   Is there a better way

User · Answer

Another way to set the column types is to first construct a numpy record array with your desired types  fill it out and then pass it to a DataFrame constructor   import pandas as pd import numpy as np      x   np empty  10    dtype    x   np uint8     y   np float64    df   pd DataFrame x   df dtypes - gt   x      uint8 y    float64

User · Answer

facing similar problem to you  In my case I have 1000 s of files from cisco logs that I need to parse manually   In order to be flexible with fields and types I have successfully tested using StringIO   read cvs which indeed does accept a dict for the dtype specification   I usually get each of the files    5k-20k lines  into a buffer and create the dtype dictionaries dynamically   Eventually I concatenate   with categorical    thanks to 0 19   these dataframes into a large data frame that I dump into hdf5    Something along these lines  import pandas as pd import io   output   io StringIO   output write  A 1 20 31 n   output write  B 2 21 32 n   output write  C 3 22 33 n   output write  D 4 23 34 n    output seek 0    df pd read csv output  header None          names   A   B   C   D            dtype   A   category   B   float32   C   int32   D   float64            sep               df info     lt class  pandas core frame DataFrame  gt  RangeIndex  5 entries  0 to 4 Data columns  total 4 columns   A    5 non-null category B    5 non-null float32 C    5 non-null int32 D    5 non-null float64 dtypes  category 1   float32 1   float64 1   int32 1  memory usage  205 0 bytes None   Not very pythonic     but does the job  Hope it helps   JC

User · Answer

You re better off using typed np arrays  and then pass the data and column names as a dictionary   import numpy as np import pandas as pd   Feature  np arrays are 1  efficient  2  can be pre-sized x   np array   a    b    dtype object  y   np array   1    2    dtype np int32  df   pd DataFrame       x    x       Feature  column name is near data array     y    y

User · Answer

you can set the types explicitly with pandas DataFrame astype dtype  copy True  raise on error True    kwargs  and pass in a dictionary with the dtypes you want to dtype  here s an example   import pandas as pd wheel number   5 car name    jeep  minutes spent   4 5    set the columns data columns     wheel number    car name    minutes spent      create an empty dataframe data df   pd DataFrame columns   data columns  df temp   pd DataFrame   wheel number  car name  minutes spent   columns   data columns  data df   data df append df temp  ignore index True    In  11   data df dtypes Out 11   wheel number     float64 car name          object minutes spent    float64 dtype  object  data df   data df astype dtype    wheel number   int64            car name   object   minutes spent   float64      now you can see that it s changed  In  18   data df dtypes Out 18   wheel number       int64 car name          object minutes spent    float64

User · Answer

Since 0 17  you have to use the explicit conversions   pd to datetime  pd to timedelta and pd to numeric    As mentioned below  no more  magic   convert objects has been deprecated in 0 17   df   pd DataFrame   x    0   a   1   b     y    0   1   1   2     z    0   2018-05-01   1   2018-05-02      df dtypes  x    object y    object z    object dtype  object  df     x  y           z 0  a  1  2018-05-01 1  b  2  2018-05-02   You can apply these to each column you want to convert   df  y     pd to numeric df  y    df  z     pd to datetime df  z        df     x  y          z 0  a  1 2018-05-01 1  b  2 2018-05-02  df dtypes  x            object y             int64 z    datetime64 ns  dtype  object   and confirm the dtype is updated     OLD DEPRECATED ANSWER for pandas 0 12 - 0 16  You can use convert objects to infer better dtypes   In  21   df Out 21       x  y 0  a  1 1  b  2  In  22   df dtypes Out 22    x    object y    object dtype  object  In  23   df convert objects convert numeric True  Out 23       x  y 0  a  1 1  b  2  In  24   df convert objects convert numeric True  dtypes Out 24    x    object y     int64 dtype  object   Magic   Sad to see it deprecated

User · Answer

For those coming from Google  etc   such as myself   convert objects has been deprecated since 0 17 - if you use it  you get a warning like this one    FutureWarning  convert objects is deprecated   Use the data-type specific converters  pd to datetime  pd to timedelta and pd to numeric    You should do something like the following    df  df astype np float   df  A    pd to numeric df  A

[python] Assign pandas dataframe column dtypes

Examples related to python

Examples related to pandas