How to read a Parquet file into Pandas DataFrame

Question

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark  This is only a moderate amount of data that I would like to read in-memory with a simple Python script on a laptop  The data does not reside on HDFS  It is either on the local file system or possibly in S3  I do not want to spin up and configure other services like Hadoop  Hive or Spark   I thought Blaze Odo would have made this possible  the Odo documentation mentions Parquet  but the examples seem all to be going through an external Hive runtime

User · Accepted Answer

pandas 0 21 introduces new functions for Parquet   pd read parquet  example pa parquet   engine  pyarrow     or  pd read parquet  example fp parquet   engine  fastparquet     The above link explains      These engines are very similar and should read write nearly identical parquet format files  These libraries differ by having different underlying dependencies  fastparquet by using numba  while pyarrow uses a c-library

User · Answer

Aside from pandas  Apache pyarrow also provides way to transform parquet to dataframe  The code is simple  just type   import pyarrow parquet as pq  df   pq read table source your file path  to pandas     For more information  see the document from Apache pyarrow Reading and Writing Single Files

User · Answer

Update  since the time I answered this there has been a lot of work on this look at Apache Arrow for a better read and write of parquet  Also  http   wesmckinney com blog python-parquet-multithreading   There is a python parquet reader that works relatively well  https   github com jcrobak parquet-python  It will create python objects and then you will have to move them to a Pandas DataFrame so the process will be slower than pd read csv for example

[python] How to read a Parquet file into Pandas DataFrame?

Examples related to python

Examples related to pandas

Examples related to parquet

Examples related to blaze