[python] Merging two CSV files using Python

When I'm working with csv files, I often use the pandas library. It makes things like this very easy. For example:

import pandas as pd

a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)

Some explanation follows. First, we read in the csv files:

>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
   title  stage    jan    feb
0   darn  3.001  0.421  0.532
1     ok  2.829  1.036  0.751
2  three  1.115  1.146  2.921
>>> b
   title    mar    apr    may       jun  Unnamed: 5
0   darn  0.631  1.321  0.951    1.7510         NaN
1     ok  1.001  0.247  2.456    0.3216         NaN
2  three  0.285  1.283  0.924  956.0000         NaN

and we see there's an extra column of data (note that the first line of fileb.csv -- title,mar,apr,may,jun, -- has an extra comma at the end). We can get rid of that easily enough:

>>> b = b.dropna(axis=1)
>>> b
   title    mar    apr    may       jun
0   darn  0.631  1.321  0.951    1.7510
1     ok  1.001  0.247  2.456    0.3216
2  three  0.285  1.283  0.924  956.0000

Now we can merge a and b on the title column:

>>> merged = a.merge(b, on='title')
>>> merged
   title  stage    jan    feb    mar    apr    may       jun
0   darn  3.001  0.421  0.532  0.631  1.321  0.951    1.7510
1     ok  2.829  1.036  0.751  1.001  0.247  2.456    0.3216
2  three  1.115  1.146  2.921  0.285  1.283  0.924  956.0000

and finally write this out:

>>> merged.to_csv("output.csv", index=False)

producing:

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to dictionary

JS map return object python JSON object must be str, bytes or bytearray, not 'dict Python update a key in dict if it doesn't exist How to update the value of a key in a dictionary in Python? How to map an array of objects in React C# Dictionary get item by index Are dictionaries ordered in Python 3.6+? Split / Explode a column of dictionaries into separate columns with pandas Writing a dictionary to a text file? enumerate() for dictionary in python

Examples related to merge

Pandas Merging 101 Python: pandas merge multiple dataframes Git merge with force overwrite Merge two dataframes by index Visual Studio Code how to resolve merge conflicts with git? merge one local branch into another local branch Merging dataframes on index with pandas Git merge is not possible because I have unmerged files Git merge develop into feature branch outputs "Already up-to-date" while it's not How merge two objects array in angularjs?

Examples related to key

How do I check if a Key is pressed on C++ Map<String, String>, how to print both the "key string" and "value string" together Python: create dictionary using dict() with integer keys? SSH Key: “Permissions 0644 for 'id_rsa.pub' are too open.” on mac SSL: error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch How to get the stream key for twitch.tv How to get key names from JSON using jq How to add multiple values to a dictionary key in python? Initializing a dictionary in python with a key value and no corresponding values How can I sort a std::map first by value, then by key?