[python] Reading an Excel file in python using pandas

I am trying to read an excel file this way :

newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)

which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?

This question is related to python python-2.7 pandas

The answer is


Close: first you call ExcelFile, but then you call the .parse method and pass it the sheet name.

>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
                  Tid  dummy1    dummy2    dummy3    dummy4    dummy5  \
0 2006-09-01 00:00:00       0  5.894611  0.605211  3.842871  8.265307   
1 2006-09-01 01:00:00       0  5.712107  0.605211  3.416617  8.301360   
2 2006-09-01 02:00:00       0  5.105300  0.605211  3.090865  8.335395   
3 2006-09-01 03:00:00       0  4.098209  0.605211  3.198452  8.170187   
4 2006-09-01 04:00:00       0  3.338196  0.605211  2.970015  7.765058   

     dummy6  dummy7    dummy8    dummy9  
0  0.623354       0  2.579108  2.681728  
1  0.554211       0  7.210000  3.028614  
2  0.567841       0  6.940000  3.644147  
3  0.581470       0  6.630000  4.016155  
4  0.595100       0  6.350000  3.974442  

What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:

>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)

This is much simple and easy way.

import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)

check out documentation full details http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html

FutureWarning: The sheetname keyword is deprecated for newer Pandas versions, use sheet_name instead.


Thought i should add here, that if you want to access rows or columns to loop through them, you do this:

import pandas as pd

# open the file
xlsx = pd.ExcelFile("PATH\FileName.xlsx")

# get the first sheet as an object
sheet1 = xlsx.parse(0)
    
# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want    
column = sheet1.icol(0).real

# get the first row as a list you can loop through
row = sheet1.irow(0).real

Edit:

The methods icol(i) and irow(i) are deprecated now. You can use sheet1.iloc[:,i] to get the i-th col and sheet1.iloc[i,:] to get the i-th row.


I think this should satisfy your need:

import pandas as pd

# Read the excel sheet to pandas dataframe
df = pd.read_excel("PATH\FileName.xlsx", sheetname=0)

You just need to feed the path to your file to pd.read_excel

import pandas as pd

file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)

Checkout the documentation to explore parameters like skiprows to ignore rows when loading the excel


Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.

import pandas as pd

sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
    sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
    sheet2 = pd.read_excel(reader, sheet_name='Sheet2')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html


Loading an excel file without explicitly naming a sheet but instead giving the number of the sheet order (often one will simply load the first sheet) goes like:

import pandas as pd
myexcel = pd.ExcelFile("C:/filename.xlsx")
myexcel = myexcel.parse(myexcel.sheet_names[0])

Since .sheet_names returns a list of sheet names, it is easy to load one or more sheets by simply calling the list element(s).


import pandas as pd

data = pd.read_excel (r'**YourPath**.xlsx')

print (data)

Questions with python tag:

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Upgrade to python 3.8 using conda Unable to allocate array with shape and data type How to fix error "ERROR: Command errored out with exit status 1: python." when trying to install django-heroku using pip How to prevent Google Colab from disconnecting? "UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure." when plotting figure with pyplot on Pycharm How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? "E: Unable to locate package python-pip" on Ubuntu 18.04 Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session' Jupyter Notebook not saving: '_xsrf' argument missing from post How to Install pip for python 3.7 on Ubuntu 18? Python: 'ModuleNotFoundError' when trying to import module from imported package OpenCV TypeError: Expected cv::UMat for argument 'src' - What is this? Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website How to setup virtual environment for Python in VS Code? Pylint "unresolved import" error in Visual Studio Code Pandas Merging 101 Numpy, multiply array with scalar What is the meaning of "Failed building wheel for X" in pip install? Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed Could not install packages due to an EnvironmentError: [Errno 13] OpenCV !_src.empty() in function 'cvtColor' error ConvergenceWarning: Liblinear failed to converge, increase the number of iterations How to downgrade python from 3.7 to 3.6 I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? Iterating over arrays in Python 3 How do I install opencv using pip? How do I install Python packages in Google's Colab? How do I use TensorFlow GPU? How to upgrade Python version to 3.7? How to resolve TypeError: can only concatenate str (not "int") to str How can I install a previous version of Python 3 in macOS using homebrew? Flask at first run: Do not use the development server in a production environment TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array What is the difference between Jupyter Notebook and JupyterLab? Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this? Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" How do I resolve a TesseractNotFoundError? Trying to merge 2 dataframes but get ValueError Authentication plugin 'caching_sha2_password' is not supported Python Pandas User Warning: Sorting because non-concatenation axis is not aligned

Questions with python-2.7 tag:

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict Visual Studio Code pylint: Unable to import 'protorpc' WinError 2 The system cannot find the file specified (Python) How to plot vectors in python using matplotlib how to update spyder on anaconda python pip - install from local dir "pip install json" fails on Ubuntu Checking whether the pip is installed? pip or pip3 to install packages for Python 3? What is a good practice to check if an environmental variable exists or not? Invalid http_host header Append an empty row in dataframe using pandas What is the difference between json.load() and json.loads() functions pandas: find percentile stats of a given column how to run python files in windows command prompt? How to get the latest file in a folder? ln (Natural Log) in Python Drop all data in a pandas dataframe ImportError: No module named google.protobuf Using Keras & Tensorflow with AMD GPU Is it ok having both Anacondas 2.7 and 3.5 installed in the same time? ImportError: cannot import name NUMPY_MKL Opencv - Grayscale mode Vs gray color conversion What is the difference between json.dump() and json.dumps() in python? Read .doc file with python Pip install - Python 2.7 - Windows 7 Python Traceback (most recent call last) Switch between python 2.7 and python 3.5 on Mac OS X In Flask, What is request.args and how is it used? Python: how to capture image from webcam on click using OpenCV Download and save PDF file with Python requests module Python 101: Can't open file: No such file or directory How do I install Keras and Theano in Anaconda Python on Windows? Make new column in Panda dataframe by adding values from other columns Why I get 'list' object has no attribute 'items'? RuntimeError: module compiled against API version a but this version of numpy is 9 What does from __future__ import absolute_import actually do? The most efficient way to remove first N elements in a list? Python - Extracting and Saving Video Frames Python for and if on one line What is the right way to debug in iPython notebook?

Questions with pandas tag:

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Pandas: ValueError: cannot convert float NaN to integer How to create a stacked bar chart for my DataFrame using seaborn? LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str' Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? Counting unique values in a column in pandas dataframe like in Qlik? Binning column with python pandas convert array into DataFrame in Python Selection with .loc in python Set value to an entire column of a pandas dataframe Pandas create empty DataFrame with only column names Python: pandas merge multiple dataframes 'DataFrame' object has no attribute 'sort' Remove Unnamed columns in pandas dataframe Convert float64 column to int64 in Pandas Understanding inplace=True How to select rows with NaN in particular column? How to print a specific row of a pandas DataFrame? Pandas rename column by position? re.sub erroring with "Expected string or bytes-like object" Python Pandas iterate over rows and access column names Display rows with one or more NaN values in pandas dataframe Python "TypeError: unhashable type: 'slice'" for encoding categorical data Seaborn Barplot - Displaying Values ValueError: Wrong number of items passed - Meaning and suggestions? How to get row number in dataframe in Pandas? How to install pandas from pip on windows cmd? Pandas convert string to int Convert list into a pandas data frame Use .corr to get the correlation between two columns Why isn't this code to plot a histogram on a continuous value Pandas column working? How to add title to seaborn boxplot ValueError: Length of values does not match length of index | Pandas DataFrame.unique() How to save a new sheet in an existing excel file, using Pandas? matplotlib: plot multiple columns of pandas data frame on the bar chart Convert List to Pandas Dataframe Column TypeError: 'DataFrame' object is not callable Set order of columns in pandas dataframe Python Pandas - Missing required dependencies ['numpy'] 1