[python] Pandas: ValueError: cannot convert float NaN to integer

I get ValueError: cannot convert float NaN to integer for following:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
  • The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
  • When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
  • When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
  • I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
  • The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
  • Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.

Update: Using the hints in comments/answers I got my data clean with this:

# x contained NaN
df = df[~df['x'].isnull()]

# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]

# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)

This question is related to python pandas csv

The answer is


ValueError: cannot convert float NaN to integer

From v0.24, you actually can. Pandas introduces Nullable Integer Data Types which allows integers to coexist with NaNs.

Given a series of whole float numbers with missing data,

s = pd.Series([1.0, 2.0, np.nan, 4.0])
s

0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64

s.dtype
# dtype('float64')

You can convert it to a nullable int type (choose from one of Int16, Int32, or Int64) with,

s2 = s.astype('Int32') # note the 'I' is uppercase
s2

0      1
1      2
2    NaN
3      4
dtype: Int32

s2.dtype
# Int32Dtype()

Your column needs to have whole numbers for the cast to happen. Anything else will raise a TypeError:

s = pd.Series([1.1, 2.0, np.nan, 4.0])

s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32

I know this has been answered but wanted to provide alternate solution for anyone in the future:

You can use .loc to subset the dataframe by only values that are notnull(), and then subset out the 'x' column only. Take that same vector, and apply(int) to it.

If column x is float:

df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)

Also, even at the lastest versions of pandas if the column is object type you would have to convert into float first, something like:

df['column_name'].astype(np.float).astype("Int32")

NB: You have to go through numpy float first and then to nullable Int32, for some reason.

The size of the int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format.


if you have null value then in doing mathematical operation you will get this error to resolve it use df[~df['x'].isnull()]df[['x']].astype(int) if you want your dataset to be unchangeable.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file