Pandas DataFrame concat vs append

Question

I have a list of 4 pandas dataframes containing a day of tick data that I want to merge into a single data frame  I cannot understand the behavior of concat on my timestamps  See details below   data    lt class  pandas core frame DataFrame  gt  DatetimeIndex  35228 entries  2013-03-28 00 00 07 089000 02 00 to 2013-03-28 18 59 20 357000 02 00 Data columns  Price       4040  non-null values Volume      4040  non-null values BidQty      35228  non-null values BidPrice    35228  non-null values AskPrice    35228  non-null values AskQty      35228  non-null values dtypes  float64 6    lt class  pandas core frame DataFrame  gt   DatetimeIndex  33088 entries  2013-04-01 00 03 17 047000 02 00 to 2013-04-01 18 59 58 175000 02 00 Data columns  Price       3969  non-null values Volume      3969  non-null values BidQty      33088  non-null values BidPrice    33088  non-null values AskPrice    33088  non-null values AskQty      33088  non-null values dtypes  float64 6    lt class  pandas core frame DataFrame  gt   DatetimeIndex  50740 entries  2013-04-02 00 03 27 470000 02 00 to 2013-04-02 18 59 58 172000 02 00 Data columns  Price       7326  non-null values Volume      7326  non-null values BidQty      50740  non-null values BidPrice    50740  non-null values AskPrice    50740  non-null values AskQty      50740  non-null values dtypes  float64 6    lt class  pandas core frame DataFrame  gt   DatetimeIndex  60799 entries  2013-04-03 00 03 06 994000 02 00 to 2013-04-03 18 59 58 180000 02 00 Data columns  Price       8258  non-null values Volume      8258  non-null values BidQty      60799  non-null values BidPrice    60799  non-null values AskPrice    60799  non-null values AskQty      60799  non-null values dtypes  float64 6     Using append I get   pd DataFrame   append data    lt class  pandas core frame DataFrame  gt  DatetimeIndex  179855 entries  2013-03-28 00 00 07 089000 02 00 to 2013-04-03 18 59 58 180000 02 00 Data columns  AskPrice    179855  non-null values AskQty      179855  non-null values BidPrice    179855  non-null values BidQty      179855  non-null values Price       23593  non-null values Volume      23593  non-null values dtypes  float64 6    Using concat I get   pd concat data    lt class  pandas core frame DataFrame  gt  DatetimeIndex  179855 entries  2013-03-27 22 00 07 089000 02 00 to 2013-04-03 16 59 58 180000 02 00 Data columns  Price       23593  non-null values Volume      23593  non-null values BidQty      179855  non-null values BidPrice    179855  non-null values AskPrice    179855  non-null values AskQty      179855  non-null values dtypes  float64 6    Notice how the index changes when using concat  Why is that happening and how would I go about using concat to reproduce the results obtained using append   Since concat seems so much faster  24 6 ms per loop vs 3 02 s per loop

User · Answer

Pandas concat vs append vs join vs merge

Concat gives the flexibility to join based on the axis( all rows or all columns)
Append is the specific case(axis=0, join='outer') of concat
Join is based on the indexes (set by set_index) on how variable =['left','right','inner','couter']
Merge is based on any particular column each of the two dataframes, this columns are variables on like 'left_on', 'right_on', 'on'

User · Answer

I have implemented a tiny benchmark  please find the code on Gist  to evaluate the pandas  concat and append  I updated the code snippet and the results after the comment by ssk08 - thanks alot   The benchmark ran on a Mac OS X 10 13 system with Python 3 6 2 and pandas 0 20 3     -------- --------------------------------- ---------------------------------             ignore index False                ignore index True                  -------- --------------------------------- ---------------------------------    size     append   concat   append concat   append   concat   append concat    -------- -------- -------- --------------- -------- -------- ---------------    small    0 4635   0 4891   94 77           0 4056   0 3314   122 39           -------- -------- -------- --------------- -------- -------- ---------------    medium   0 5532   0 6617   83 60           0 3605   0 3521   102 37           -------- -------- -------- --------------- -------- -------- ---------------    large    0 9558   0 9442   101 22          0 6670   0 6749   98 84            -------- -------- -------- --------------- -------- -------- ---------------    Using ignore index False append is slightly faster  with ignore index True concat is slightly faster   tl dr No significant difference between concat and append

User · Answer

So what are you doing is with append and concat is almost equivalent  The difference is the empty DataFrame  For some reason this causes a big slowdown  not sure exactly why  will have to look at some point  Below is a recreation of basically what you did   I almost always use concat  though in this case they are equivalent  except for the empty frame   if you don t use the empty frame they will be the same speed   In  17   df1   pd DataFrame dict A   range 10000   index pd date range  20130101  periods 10000 freq  s     In  18   df1 Out 18     lt class  pandas core frame DataFrame  gt  DatetimeIndex  10000 entries  2013-01-01 00 00 00 to 2013-01-01 02 46 39 Freq  S Data columns  total 1 columns   A    10000  non-null values dtypes  int64 1   In  19   df4   pd DataFrame    The concat  In  20    timeit pd concat  df1 df2 df3   1000 loops  best of 3  270 us per loop  This is equavalent of your append  In  21    timeit pd concat  df4 df1 df2 df3   10 loops  best of    3  56 8 ms per loop

User · Answer

One more thing you have to keep in mind that the APPEND   method in Pandas doesn t modify the original object  Instead it creates a new one with combined data  Because of involving creation and data buffer  its performance is not well  You d better use CONCAT   function when doing multi-APPEND operations

[python] Pandas DataFrame concat vs append

Pandas concat vs append vs join vs merge

Examples related to python

Examples related to pandas