Python pandas merge multiple dataframes

Question

I have diferent dataframes and need to merge them together based on the date column  If I only had two dataframes  I could use df1 merge df2  on  date    to do it with three dataframes  I use df1 merge df2 merge df3  on  date    on  date    however it becomes really complex and unreadable to do it with multiple dataframes    All dataframes have one column in common -date  but they don t have the same number of rows nor columns and I only need those rows in which each date is common to every dataframe   So  I m trying to write a recursion function that returns a dataframe with all data but it didn t work  How should I merge multiple dataframes then   I tried diferent ways and got errors like out of range  keyerror 0 1 2 3 and can not merge DataFrame with instance of type  lt class  NoneType  gt    This is the script I wrote   dfs    df1  df2  df3    list of dataframes  def mergefiles dfs  countfiles  i 0       if i     countfiles - 2     it gets to the second to last and merges it with the last         return      dfm   dfs i  merge mergefiles dfs i 1   countfiles  i i 1   on  date       return dfm  print mergefiles dfs  len dfs      An example  df 1   May 19  2017 1 200 00 0 1  May 18  2017 1 100 00 0 1  May 17  2017 1 000 00 0 1  May 15  2017 1 901 00 0 1    df 2   May 20  2017 2 200 00 1000000 0 2  May 18  2017 2 100 00 1590000 0 2  May 16  2017 2 000 00 1230000 0 2  May 15  2017 2 902 00 1000000 0 2    df 3   May 21  2017 3 200 00 2000000 0 3  May 17  2017 3 100 00 2590000 0 3  May 16  2017 3 000 00 2230000 0 3  May 15  2017 3 903 00 2000000 0 3    Expected merge result   May 15  2017   1 901 00 0 1    2 902 00 1000000 0 2     3 903 00 2000000 0 3

User · Answer

dannyeuu s answer is correct  pd concat naturally does a join on index columns  if you set the axis option to 1  The default is an outer join  but you can specify inner join too  Here is an example   x   pd DataFrame   a    2 4 3 4 5 2 3 4 2 5    b   2 3 4 1 6 6 5 2 4 2    val    1 4 4 3 6 4 3 6 5 7    val2    2 4 1 6 4 2 8 6 3 9    x set index   a   b    inplace True  x sort index inplace True   y   x   deepcopy     y loc  14 14        3 1  y  other   range 0 11   y sort values  val   inplace True   z   x   deepcopy     z loc  15 15        3 4  z  another   range 0 22 2  z sort values  val2  inplace True    pd concat  x y z  axis 1

User · Answer

everestial007  s solution worked for me  This is how I improved it for my use case  which is to have the columns of each different df with a different suffix so I can more easily differentiate between the dfs in the final merged dataframe  from functools import reduce import pandas as pd dfs    df1  df2  df3  df4  suffixes    f quot   i  quot  for i in range len dfs      add suffixes to each df dfs    dfs i  add suffix suffixes i   for i in range len dfs      remove suffix from the merging column dfs    dfs i  rename columns  f quot date suffixes i   quot   quot date quot    for i in range len dfs      merge dfs   reduce lambda left right  pd merge left right how  outer   on  date    dfs

User · Answer

If you are filtering by common date this will return it   dfs    df1  df2  df3  checker   dfs -1  check   set checker loc    0    for df in dfs  -1       check   check intersection set df loc    0     print checker checker loc    0  isin check

User · Answer

Thank you for your help  jezrael   zipa and  everestial007  both answers are what I need  If I wanted to make a recursive  this would also work as intended   def mergefiles dfs     on             Merge a list of files based on one column        if len dfs     1           return  List only have one element        elif len dfs     2          df1   dfs 0          df2   dfs 1          df   df1 merge df2  on on          return df        Merge the first and second datafranes into new dataframe     df1   dfs 0      df2   dfs 1      df   dfs 0  merge dfs 1   on on         Create new list with merged dataframe     dfl          dfl append df         Join lists     dfl   dfl   dfs 2        dfm   mergefiles dfl  on      return dfm

User · Answer

functools reduce and pd concat are good solutions but in term of execution time pd concat is the best  from functools import reduce import pandas as pd  dfs    df1  df2  df3       nan value   0    solution 1  fast  result 1   pd concat dfs  join  outer   axis 1  fillna nan value     solution 2 result 2   reduce lambda df left df right  pd merge df left  df right                                                 left index True  right index True                                                 how  outer                       dfs  fillna nan value

User · Answer

Looks like the data has the same columns  so you can   df1   pd DataFrame data1  df2   pd DataFrame data2   merged df   pd concat  df1  df2

User · Answer

There are 2 solutions for this  but it return all columns separately   import functools  dfs    df1  df2  df3   df final   functools reduce lambda left right  pd merge left right on  date    dfs  print  df final            date     a x   b x       a y      b y   c x         a        b   c y 0  May 15 2017  900 00  0 2   1 900 00  1000000  0 2   2 900 00  2000000  0 2   k   np arange len dfs   astype str  df   pd concat  x set index  date   for x in dfs   axis 1  join  inner   keys k  df columns   df columns map     join  print  df                  0 a   0 b       1 a      1 b   1 c       2 a      2 b   2 c date                                                                        May 15 2017  900 00  0 2   1 900 00  1000000  0 2   2 900 00  2000000  0 2

User · Answer

Below  is the most clean  comprehensible way of merging multiple dataframe if complex queries aren t involved   Just simply merge with DATE as the index and merge using OUTER method  to get all the data     import pandas as pd from functools import reduce  df1   pd read table  file1 csv   sep      df2   pd read table  file2 csv   sep      df3   pd read table  file3 csv   sep        Now  basically load all the files you have as data frame into a list  And  then merge the files using merge or reduce function     compile the list of dataframes you want to merge data frames    df1  df2  df3    Note  you can add as many data-frames inside the above list  This is the good part about this method  No complex queries involved   To keep the values that belong to the same date you need to merge it on the DATE  df merged   reduce lambda  left right  pd merge left right on   DATE                                                how  outer    data frames     if you want to fill the values that don t exist in the lines of merged dataframe simply fill with required strings as  df merged   reduce lambda  left right  pd merge left right on   DATE                                                how  outer    data frames  fillna  void      Now  the output will the values from the same date on the same lines  You can fill the non existing data from different frames for different columns using fillna      Then write the merged data to the csv file if desired   pd DataFrame to csv df merged   merged txt   sep      na rep      index False    This should give you  DATE    VALUE1    VALUE2    VALUE3

User · Answer

Look at this pandas three-way joining multiple dataframes on columns  filenames     fn1    fn2    fn3    fn4        dfs    pd read csv filename  index col index col  for filename in filenames   dfs 0  join dfs 1

[python] Python: pandas merge multiple dataframes

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to merge

Examples related to data-analysis