Let's say you have 2 csv
files like these:
csv1.csv:
id,name
1,Armin
2,Sven
csv2.csv:
id,place,year
1,Reykjavik,2017
2,Amsterdam,2018
3,Berlin,2019
and you want the result to be like this csv3.csv:
id,name,place,year
1,Armin,Reykjavik,2017
2,Sven,Amsterdam,2018
3,,Berlin,2019
Then you can use the following snippet to do that:
import csv
import pandas as pd
# the file names
f1 = "csv1.csv"
f2 = "csv2.csv"
out_f = "csv3.csv"
# read the files
df1 = pd.read_csv(f1)
df2 = pd.read_csv(f2)
# get the keys
keys1 = list(df1)
keys2 = list(df2)
# merge both files
for idx, row in df2.iterrows():
data = df1[df1['id'] == row['id']]
# if row with such id does not exist, add the whole row
if data.empty:
next_idx = len(df1)
for key in keys2:
df1.at[next_idx, key] = df2.at[idx, key]
# if row with such id exists, add only the missing keys with their values
else:
i = int(data.index[0])
for key in keys2:
if key not in keys1:
df1.at[i, key] = df2.at[idx, key]
# save the merged files
df1.to_csv(out_f, index=False, encoding='utf-8', quotechar="", quoting=csv.QUOTE_NONE)
With the help of a loop you can achieve the same result for multiple files as it is in your case (200 csv files).