[python] Convert floats to ints in Pandas?

I've been working with data imported from a CSV. Pandas changed some columns to float, so now the numbers in these columns get displayed as floating points! However, I need them to be displayed as integers, or, without comma. Is there a way to convert them to integers or not display the comma?

This question is related to python pandas floating-point integer dataset

The answer is


The columns that needs to be converted to int can be mentioned in a dictionary also as below

df = df.astype({'col1': 'int', 'col2': 'int', 'col3': 'int'})

>>> import pandas as pd
>>> right = pd.DataFrame({'C': [1.002, 2.003], 'D': [1.009, 4.55], 'key': ['K0', 'K1']})
>>> print(right)
           C      D key
    0  1.002  1.009  K0
    1  2.003  4.550  K1
>>> right['C'] = right.C.astype(int)
>>> print(right)
       C      D key
    0  1  1.009  K0
    1  2  4.550  K1

Here's a simple function that will downcast floats into the smallest possible integer type that doesn't lose any information. For examples,

  • 100.0 can be converted from float to integer, but 99.9 can't (without losing information to rounding or truncation)

  • Additionally, 1.0 can be downcast all the way to int8 without losing information, but the smallest integer type for 100_000.0 is int32

Code examples:

import numpy as np
import pandas as pd

def float_to_int( s ):
    if ( s.astype(np.int64) == s ).all():
        return pd.to_numeric( s, downcast='integer' )
    else:
        return s

# small integers are downcast into 8-bit integers
float_to_int( np.array([1.0,2.0]) )
Out[1]:array([1, 2], dtype=int8)

# larger integers are downcast into larger integer types
float_to_int( np.array([100_000.,200_000.]) )
Out[2]: array([100000, 200000], dtype=int32)

# if there are values to the right of the decimal
# point, no conversion is made
float_to_int( np.array([1.1,2.2]) )
Out[3]: array([ 1.1,  2.2])

To convert all float columns to int

>>> df = pd.DataFrame(np.random.rand(5, 4) * 10, columns=list('PQRS'))
>>> print(df)
...     P           Q           R           S
... 0   4.395994    0.844292    8.543430    1.933934
... 1   0.311974    9.519054    6.171577    3.859993
... 2   2.056797    0.836150    5.270513    3.224497
... 3   3.919300    8.562298    6.852941    1.415992
... 4   9.958550    9.013425    8.703142    3.588733

>>> float_col = df.select_dtypes(include=['float64']) # This will select float columns only
>>> # list(float_col.columns.values)

>>> for col in float_col.columns.values:
...     df[col] = df[col].astype('int64')

>>> print(df)
...     P   Q   R   S
... 0   4   0   8   1
... 1   0   9   6   3
... 2   2   0   5   3
... 3   3   8   6   1
... 4   9   9   8   3

Expanding on @Ryan G mentioned usage of the pandas.DataFrame.astype(<type>) method, one can use the errors=ignore argument to only convert those columns that do not produce an error, which notably simplifies the syntax. Obviously, caution should be applied when ignoring errors, but for this task it comes very handy.

>>> df = pd.DataFrame(np.random.rand(3, 4), columns=list('ABCD'))
>>> df *= 10
>>> print(df)
...           A       B       C       D
... 0   2.16861 8.34139 1.83434 6.91706
... 1   5.85938 9.71712 5.53371 4.26542
... 2   0.50112 4.06725 1.99795 4.75698

>>> df['E'] = list('XYZ')
>>> df.astype(int, errors='ignore')
>>> print(df)
...     A   B   C   D   E
... 0   2   8   1   6   X
... 1   5   9   5   4   Y
... 2   0   4   1   4   Z

From pandas.DataFrame.astype docs:

errors : {‘raise’, ‘ignore’}, default ‘raise’

Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised
  • ignore : suppress exceptions. On error return original object

New in version 0.20.0.


This is a quick solution in case you want to convert more columns of your pandas.DataFrame from float to integer considering also the case that you can have NaN values.

cols = ['col_1', 'col_2', 'col_3', 'col_4']
for col in cols:
   df[col] = df[col].apply(lambda x: int(x) if x == x else "")

I tried with else x) and else None), but the result is still having the float number, so I used else "".


Use the pandas.DataFrame.astype(<type>) function to manipulate column dtypes.

>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
          A         B         C         D
0  0.542447  0.949988  0.669239  0.879887
1  0.068542  0.757775  0.891903  0.384542
2  0.021274  0.587504  0.180426  0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

EDIT:

To handle missing values:

>>> df
          A         B     C         D
0  0.475103  0.355453  0.66  0.869336
1  0.260395  0.200287   NaN  0.617024
2  0.517692  0.735613  0.18  0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

Considering the following data frame:

>>> df = pd.DataFrame(10*np.random.rand(3, 4), columns=list("ABCD"))
>>> print(df)
...           A         B         C         D
... 0  8.362940  0.354027  1.916283  6.226750
... 1  1.988232  9.003545  9.277504  8.522808
... 2  1.141432  4.935593  2.700118  7.739108

Using a list of column names, change the type for multiple columns with applymap():

>>> cols = ['A', 'B']
>>> df[cols] = df[cols].applymap(np.int64)
>>> print(df)
...    A  B         C         D
... 0  8  0  1.916283  6.226750
... 1  1  9  9.277504  8.522808
... 2  1  4  2.700118  7.739108

Or for a single column with apply():

>>> df['C'] = df['C'].apply(np.int64)
>>> print(df)
...    A  B  C         D
... 0  8  0  1  6.226750
... 1  1  9  9  8.522808
... 2  1  4  2  7.739108

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to floating-point

Convert list or numpy array of single element to float in python Convert float to string with precision & number of decimal digits specified? Float and double datatype in Java C convert floating point to int Convert String to Float in Swift How do I change data-type of pandas data frame to string with a defined format? How to check if a float value is a whole number Convert floats to ints in Pandas? Converting Float to Dollars and Cents Format / Suppress Scientific Notation from Python Pandas Aggregation Results

Examples related to integer

Python: create dictionary using dict() with integer keys? How to convert datetime to integer in python Can someone explain how to append an element to an array in C programming? How to get the Power of some Integer in Swift language? python "TypeError: 'numpy.float64' object cannot be interpreted as an integer" What's the difference between integer class and numeric class in R PostgreSQL: ERROR: operator does not exist: integer = character varying C++ - how to find the length of an integer Converting binary to decimal integer output Convert floats to ints in Pandas?

Examples related to dataset

How to convert a Scikit-learn dataset to a Pandas dataset? Convert floats to ints in Pandas? Convert DataSet to List C#, Looping through dataset and show each record from a dataset column How to add header to a dataset in R? How I can filter a Datatable? Stored procedure return into DataSet in C# .Net Looping through a DataTable adding a datatable in a dataset How to fill Dataset with multiple tables?