[python] How to convert datatype:object to float64 in python?

I am going around in circles and tried so many different ways so I guess my core understanding is wrong. I would be grateful for help in understanding my encoding/decoding issues.

I import the dataframe from SQL and it seems that some datatypes:float64 are converted to Object. Thus, I cannot do any calculation. I fail to convert the Object back to float64.

df.head()

Date        WD  Manpower 2nd     CTR    2ndU    T1    T2      T3      T4 

2013/4/6    6   NaN     2,645   5.27%   0.29    407     533     454     368
2013/4/7    7   NaN     2,118   5.89%   0.31    257     659     583     369
2013/4/13   6   NaN     2,470   5.38%   0.29    354     531     473   383
2013/4/14   7   NaN     2,033   6.77%   0.37    396     748     681     458
2013/4/20   6   NaN     2,690   5.38%   0.29    361     528     541     381

df.dtypes

WD             float64
Manpower       float64
2nd             object
CTR             object
2ndU           float64
T1              object
T2              object
T3              object
T4              object
T5              object

dtype: object

SQL table:

enter image description here

This question is related to python pandas

The answer is


convert_objects is deprecated.

For pandas >= 0.17.0, use pd.to_numeric

df["2nd"] = pd.to_numeric(df["2nd"])

X = np.array(X, dtype=float)

You can use this to convert to array of float in python 3.7.6


You can try this:

df['2nd'] = pd.to_numeric(df['2nd'].str.replace(',', ''))
df['CTR'] = pd.to_numeric(df['CTR'].str.replace('%', ''))

I had this problem in a DataFrame (df) created from an Excel-sheet with several internal header rows.

After cleaning out the internal header rows from df, the columns' values were of "non-null object" type (DataFrame.info()).

This code converted all numerical values of multiple columns to int64 and float64 in one go:

for i in range(0, len(df.columns)):
    df.iloc[:,i] = pd.to_numeric(df.iloc[:,i], errors='ignore')
    # errors='ignore' lets strings remain as 'non-null objects'

Or you can use regular expression to handle multiple items as the general case of this issue,

df['2nd'] = pd.to_numeric(df['2nd'].str.replace(r'[,.%]','')) 
df['CTR'] = pd.to_numeric(df['CTR'].str.replace(r'[^\d%]',''))