[python-3.x] Concatenate strings from several rows using Pandas groupby

I want to merge several strings in a dataframe based on a groupedby in Pandas.

This is my code so far:

import pandas as pd
from io import StringIO

data = StringIO("""
"name1","hej","2014-11-01"
"name1","du","2014-11-02"
"name1","aj","2014-12-01"
"name1","oj","2014-12-02"
"name2","fin","2014-11-01"
"name2","katt","2014-11-02"
"name2","mycket","2014-12-01"
"name2","lite","2014-12-01"
""")

# load string as stream into dataframe
df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2])

# add column with month
df["month"] = df["date"].apply(lambda x: x.month)

I want the end result to look like this:

enter image description here

I don't get how I can use groupby and apply some sort of concatenation of the strings in the column "text". Any help appreciated!

This question is related to python-3.x pandas pandas-groupby

The answer is


If you want to concatenate your "text" in a list:

df.groupby(['name', 'month'], as_index = False).agg({'text': list})

we can groupby the 'name' and 'month' columns, then call agg() functions of Panda’s DataFrame objects.

The aggregation functionality provided by the agg() function allows multiple statistics to be calculated per group in one calculation.

df.groupby(['name', 'month'], as_index = False).agg({'text': ' '.join})

enter image description here


The answer by EdChum provides you with a lot of flexibility but if you just want to concateate strings into a column of list objects you can also:

output_series = df.groupby(['name','month'])['text'].apply(list)


For me the above solutions were close but added some unwanted /n's and dtype:object, so here's a modified version:

df.groupby(['name', 'month'])['text'].apply(lambda text: ''.join(text.to_string(index=False))).str.replace('(\\n)', '').reset_index()

Examples related to python-3.x

Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Replace specific text with a redacted version using Python Upgrade to python 3.8 using conda "Permission Denied" trying to run Python on Windows 10 Python: 'ModuleNotFoundError' when trying to import module from imported package What is the meaning of "Failed building wheel for X" in pip install? How to downgrade python from 3.7 to 3.6 I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? Iterating over arrays in Python 3 How to upgrade Python version to 3.7?

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to pandas-groupby

Count unique values with pandas per groups Group dataframe and get sum AND count? How do I create a new column from the output of pandas groupby().sum()? How to loop over grouped Pandas dataframe? Concatenate strings from several rows using Pandas groupby pandas dataframe groupby datetime month How to group dataframe rows into list in pandas groupby Renaming Column Names in Pandas Groupby function Get statistics for each group (such as count, mean, etc) using pandas GroupBy? pandas GroupBy columns with NaN (missing) values