[python] Python pandas: how to specify data types when reading an Excel file?

I am importing an excel file into a pandas dataframe with the pandas.read_excel() function.

One of the columns is the primary key of the table: it's all numbers, but it's stored as text (the little green triangle in the top left of the Excel cells confirms this).

However, when I import the file into a pandas dataframe, the column gets imported as a float. This means that, for example, '0614' becomes 614.

Is there a way to specify the datatype when importing a column? I understand this is possible when importing CSV files but couldn't find anything in the syntax of read_excel().

The only solution I can think of is to add an arbitrary letter at the beginning of the text (converting '0614' into 'A0614') in Excel, to make sure the column is imported as text, and then chopping off the 'A' in python, so I can match it to other tables I am importing from SQL.

This question is related to python pandas dataframe

The answer is


If you don't know the column names and you want to specify str data type to all columns:

table = pd.read_excel("path_to_filename")
cols = table.columns
conv = dict(zip(cols ,[str] * len(cols)))
table = pd.read_excel("path_to_filename", converters=conv)

If you are able to read the excel file correctly and only the integer values are not showing up. you can specify like this.

df = pd.read_excel('my.xlsx',sheetname='Sheet1', engine="openpyxl", dtype=str)

this should change your integer values into a string and show in dataframe


In case if you are not aware of the number and name of columns in dataframe then this method can be handy:

column_list = []
df_column = pd.read_excel(file_name, 'Sheet1').columns
for i in df_column:
    column_list.append(i)
converter = {col: str for col in column_list} 
df_actual = pd.read_excel(file_name, converters=converter)

where column_list is the list of your column names.


The read_excel() function has a converters argument, where you can apply functions to input in certain columns. You can use this to keep them as strings. Documentation:

Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the Excel cell content, and return the transformed content.

Example code:

pandas.read_excel(my_file, converters = {my_str_column: str})

If your key has a fixed number of digits, you should probably store as text rather than as numeric data. You can use the converters argument or read_excel for this.

Or, if this does not work, just manipulate your data once it's read into your dataframe:

df['key_zfill'] = df['key'].astype(str).str.zfill(4)

  names   key key_zfill
0   abc     5      0005
1   def  4962      4962
2   ghi   300      0300
3   jkl    14      0014
4   mno    20      0020

Starting with v0.20.0, the dtype keyword argument in read_excel() function could be used to specify the data types that needs to be applied to the columns just like it exists for read_csv() case.

Using converters and dtype arguments together on the same column name would lead to the latter getting shadowed and the former gaining preferance.


1) Inorder for it to not interpret the dtypes but rather pass all the contents of it's columns as they were originally in the file before, we could set this arg to str or object so that we don't mess up our data. (one such case would be leading zeros in numbers which would be lost otherwise)

pd.read_excel('file_name.xlsx', dtype=str)            # (or) dtype=object

2) It even supports a dict mapping wherein the keys constitute the column names and values it's respective data type to be set especially when you want to alter the dtype for a subset of all the columns.

# Assuming data types for `a` and `b` columns to be altered
pd.read_excel('file_name.xlsx', dtype={'a': np.float64, 'b': np.int32})

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe