I tried to convert a column from data type float64
to int64
using:
df['column name'].astype(int64)
but got an error:
NameError: name 'int64' is not defined
The column has number of people but was formatted as 7500000.0
, any idea how I can simply change this float64
into int64
?
consider using
df['column name'].astype('Int64')
nan
will be changed to NaN
This seems to be a little buggy in Pandas 0.23.4?
If there are np.nan values then this will throw an error as expected:
df['col'] = df['col'].astype(np.int64)
But doesn't change any values from float to int as I would expect if "ignore" is used:
df['col'] = df['col'].astype(np.int64,errors='ignore')
It worked if I first converted np.nan:
df['col'] = df['col'].fillna(0).astype(np.int64)
df['col'] = df['col'].astype(np.int64)
Now I can't figure out how to get null values back in place of the zeroes since this will convert everything back to float again:
df['col'] = df['col'].replace(0,np.nan)
You can need to pass in the string 'int64'
:
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1.0, 2.0]}) # some test dataframe
>>> df['a'].astype('int64')
0 1
1 2
Name: a, dtype: int64
There are some alternative ways to specify 64-bit integers:
>>> df['a'].astype('i8') # integer with 8 bytes (64 bit)
0 1
1 2
Name: a, dtype: int64
>>> import numpy as np
>>> df['a'].astype(np.int64) # native numpy 64 bit integer
0 1
1 2
Name: a, dtype: int64
Or use np.int64
directly on your column (but it returns a numpy.array
):
>>> np.int64(df['a'])
array([1, 2], dtype=int64)
Source: Stackoverflow.com