How to read a xlsx file using the pandas Library in iPython

Question

I want to read a  xlsx file using the Pandas Library of python and port the data to a postgreSQL table    All I could do up until now is   import pandas as pd data   pd ExcelFile   File Name       Now I know that the step got executed successfully  but I want to know how i can parse the excel file that has been read so that I can understand how the data in the excel maps to the data in the variable data   I learnt that data is a Dataframe object if I m not wrong  So How do i parse this dataframe object to extract each line row by row

User · Accepted Answer

I usually create a dictionary containing a DataFrame for every sheet:

xl_file = pd.ExcelFile(file_name)

dfs = {sheet_name: xl_file.parse(sheet_name) 
          for sheet_name in xl_file.sheet_names}

Update: In pandas version 0.21.0+ you will get this behavior more cleanly by passing sheet_name=None to read_excel:

dfs = pd.read_excel(file_name, sheet_name=None)

In 0.20 and prior, this was sheetname rather than sheet_name (this is now deprecated in favor of the above):

dfs = pd.read_excel(file_name, sheetname=None)

User · Answer

Instead of using a sheet name  in case you don t know or can t open the excel file to check in ubuntu  in my case  Python 3 6 7  ubuntu 18 04   I use the parameter index col  index col 0 for the first sheet   import pandas as pd file name    some data file xlsx   df   pd read excel file name  index col 0  print df head      print the first 5 rows

User · Answer

Assign spreadsheet filename to file  Load spreadsheet  Print the sheet names  Load a sheet into a DataFrame by name  df1  file    example xlsx  xl   pd ExcelFile file  print xl sheet names  df1   xl parse  Sheet1

User · Answer

DataFrame s read excel method is like read csv method   dfs   pd read excel xlsx file  sheetname  sheet1     Help on function read excel in module pandas io excel   read excel io  sheetname 0  header 0  skiprows None  skip footer 0  index col None  names None  parse cols None  parse dates False  date parser None  na values None  thousands None  convert float True  has index names None  converters None  true values None  false values None  engine None  squeeze False    kwds      Read an Excel table into a pandas DataFrame      Parameters     ----------     io   string  path object  pathlib Path or py  path local LocalPath           file-like object  pandas ExcelFile  or xlrd workbook          The string could be a URL  Valid URL schemes include http  ftp  s3          and file  For file URLs  a host is expected  For instance  a local         file could be file   localhost path to workbook xlsx     sheetname   string  int  mixed list of strings ints  or None  default 0          Strings are used for sheet names  Integers are used in zero-indexed         sheet positions           Lists of strings integers are used to request multiple sheets           Specify None to get all sheets           str int - gt  DataFrame is returned          list None - gt  Dict of DataFrames is returned  with keys representing         sheets           Available Cases            Defaults to 0 - gt  1st sheet as a DataFrame           1 - gt  2nd sheet as a DataFrame            Sheet1  - gt  1st sheet as a DataFrame            0 1  Sheet5   - gt  1st  2nd  amp  5th sheet as a dictionary of DataFrames           None - gt  All sheets as a dictionary of DataFrames      header   int  list of ints  default 0         Row  0-indexed  to use for the column labels of the parsed         DataFrame  If a list of integers is passed those row positions will         be combined into a   MultiIndex       skiprows   list-like         Rows to skip at the beginning  0-indexed      skip footer   int  default 0         Rows at the end to skip  0-indexed      index col   int  list of ints  default None         Column  0-indexed  to use as the row labels of the DataFrame          Pass None if there is no such column   If a list is passed          those columns will be combined into a   MultiIndex       names   array-like  default None         List of column names to use  If file contains no header row          then you should explicitly pass header None     converters   dict  default None         Dict of functions for converting values in certain columns  Keys can         either be integers or column labels  values are functions that take one         input argument  the Excel cell content  and return the transformed         content      true values   list  default None         Values to consider as True             versionadded   0 19 0      false values   list  default None         Values to consider as False             versionadded   0 19 0      parse cols   int or list  default None           If None then parse all columns            If int then indicates last column to be parsed           If list of ints then indicates list of column numbers to be parsed           If string then indicates comma separated list of column names and           column ranges  e g   A E  or  A C E F       squeeze   boolean  default False         If the parsed data only contains one column then return a Series     na values   scalar  str  list-like  or dict  default None         Additional strings to recognize as NA NaN  If dict passed  specific         per-column NA values  By default the following values are interpreted         as NaN        N A     N A N A     NA    -1  IND    -1  QNAN    -NaN    -nan        1  IND    1  QNAN    N A    NA    NULL    NaN    nan       thousands   str  default None         Thousands separator for parsing string columns to numeric   Note that         this parameter is only necessary for columns stored as TEXT in Excel          any numeric columns will automatically be parsed  regardless of display         format      keep default na   bool  default True         If na values are specified and keep default na is False the default NaN         values are overridden  otherwise they re appended to      verbose   boolean  default False         Indicate number of NA values placed in non-numeric columns     engine  string  default None         If io is not a buffer or path  this must be set to identify io          Acceptable values are None or xlrd     convert float   boolean  default True         convert integral floats to int  i e   1 0 -- gt  1   If False  all numeric         data will be read in as floats  Excel stores all numbers as floats         internally     has index names   boolean  default None         DEPRECATED  for version 0 17  index names will be automatically         inferred based on index col   To read Excel output from 0 16 2 and         prior that had saved index names  use True       Returns     -------     parsed   DataFrame or Dict of DataFrames         DataFrame from the passed in Excel file   See notes in sheetname         argument for more information on when a Dict of Dataframes is returned

User · Answer

If you use read excel   on a file opened using the function open    make sure to add rb to the open function to avoid encoding errors

User · Answer

pd read excel file name    sometimes this code gives an error for xlsx files as  XLRDError Excel xlsx file  not supported instead   you can use openpyxl engine to read excel file  df samples   pd read excel r filename xlsx   engine  openpyxl    It worked for me

User · Answer

The following worked for me  from pandas import read excel my sheet    Sheet1    change it to your sheet name  you can find your sheet name at the bottom left of your excel file file name    products and categories xlsx    change it to the name of your excel file df   read excel file name  sheet name   my sheet  print df head      shows headers with top 5 rows

[python] How to read a .xlsx file using the pandas Library in iPython?

Examples related to python

Examples related to pandas

Examples related to ipython

Examples related to ipython-notebook

Examples related to dataframe