[python] Group dataframe and get sum AND count?

I have a dataframe that looks like this:

              Company Name              Organisation Name  Amount
10118  Vifor Pharma UK Ltd  Welsh Assoc for Gastro & Endo 2700.00
10119  Vifor Pharma UK Ltd    Welsh IBD Specialist Group,  169.00
10120  Vifor Pharma UK Ltd             West Midlands AHSN 1200.00
10121  Vifor Pharma UK Ltd           Whittington Hospital   63.00
10122  Vifor Pharma UK Ltd                 Ysbyty Gwynedd   75.93

How do I sum the Amount and count the Organisation Name, to get a new dataframe that looks like this?

              Company Name             Organisation Count   Amount
10118  Vifor Pharma UK Ltd                              5 11000.00

I know how to sum or count:

df.groupby('Company Name').sum()
df.groupby('Company Name').count()

But not how to do both!

This question is related to python pandas dataframe group-by pandas-groupby

The answer is


try this:

In [110]: (df.groupby('Company Name')
   .....:    .agg({'Organisation Name':'count', 'Amount': 'sum'})
   .....:    .reset_index()
   .....:    .rename(columns={'Organisation Name':'Organisation Count'})
   .....: )
Out[110]:
          Company Name   Amount  Organisation Count
0  Vifor Pharma UK Ltd  4207.93                   5

or if you don't want to reset index:

df.groupby('Company Name')['Amount'].agg(['sum','count'])

or

df.groupby('Company Name').agg({'Amount': ['sum','count']})

Demo:

In [98]: df.groupby('Company Name')['Amount'].agg(['sum','count'])
Out[98]:
                         sum  count
Company Name
Vifor Pharma UK Ltd  4207.93      5

In [99]: df.groupby('Company Name').agg({'Amount': ['sum','count']})
Out[99]:
                      Amount
                         sum count
Company Name
Vifor Pharma UK Ltd  4207.93     5

Just in case you were wondering how to rename columns during aggregation, here's how for

pandas >= 0.25: Named Aggregation

df.groupby('Company Name')['Amount'].agg(MySum='sum', MyCount='count')

Or,

df.groupby('Company Name').agg(MySum=('Amount', 'sum'), MyCount=('Amount', 'count'))

                       MySum  MyCount
Company Name                       
Vifor Pharma UK Ltd  4207.93        5

If you have lots of columns and only one is different you could do:

In[1]: grouper = df.groupby('Company Name')
In[2]: res = grouper.count()
In[3]: res['Amount'] = grouper.Amount.sum()
In[4]: res
Out[4]:
                      Organisation Name   Amount
Company Name                                   
Vifor Pharma UK Ltd                  5  4207.93

Note you can then rename the Organisation Name column as you wish.


df.groupby('Company Name').agg({'Organisation name':'count','Amount':'sum'})\
    .apply(lambda x: x.sort_values(['count','sum'], ascending=False))

Questions with python tag:

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Upgrade to python 3.8 using conda Unable to allocate array with shape and data type How to fix error "ERROR: Command errored out with exit status 1: python." when trying to install django-heroku using pip How to prevent Google Colab from disconnecting? "UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure." when plotting figure with pyplot on Pycharm How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? "E: Unable to locate package python-pip" on Ubuntu 18.04 Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session' Jupyter Notebook not saving: '_xsrf' argument missing from post How to Install pip for python 3.7 on Ubuntu 18? Python: 'ModuleNotFoundError' when trying to import module from imported package OpenCV TypeError: Expected cv::UMat for argument 'src' - What is this? Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website How to setup virtual environment for Python in VS Code? Pylint "unresolved import" error in Visual Studio Code Pandas Merging 101 Numpy, multiply array with scalar What is the meaning of "Failed building wheel for X" in pip install? Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed Could not install packages due to an EnvironmentError: [Errno 13] OpenCV !_src.empty() in function 'cvtColor' error ConvergenceWarning: Liblinear failed to converge, increase the number of iterations How to downgrade python from 3.7 to 3.6 I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? Iterating over arrays in Python 3 How do I install opencv using pip? How do I install Python packages in Google's Colab? How do I use TensorFlow GPU? How to upgrade Python version to 3.7? How to resolve TypeError: can only concatenate str (not "int") to str How can I install a previous version of Python 3 in macOS using homebrew? Flask at first run: Do not use the development server in a production environment TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array What is the difference between Jupyter Notebook and JupyterLab? Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this? Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" How do I resolve a TesseractNotFoundError? Trying to merge 2 dataframes but get ValueError Authentication plugin 'caching_sha2_password' is not supported Python Pandas User Warning: Sorting because non-concatenation axis is not aligned

Questions with pandas tag:

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Pandas: ValueError: cannot convert float NaN to integer How to create a stacked bar chart for my DataFrame using seaborn? LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str' Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? Counting unique values in a column in pandas dataframe like in Qlik? Binning column with python pandas convert array into DataFrame in Python Selection with .loc in python Set value to an entire column of a pandas dataframe Pandas create empty DataFrame with only column names Python: pandas merge multiple dataframes 'DataFrame' object has no attribute 'sort' Remove Unnamed columns in pandas dataframe Convert float64 column to int64 in Pandas Understanding inplace=True How to select rows with NaN in particular column? How to print a specific row of a pandas DataFrame? Pandas rename column by position? re.sub erroring with "Expected string or bytes-like object" Python Pandas iterate over rows and access column names Display rows with one or more NaN values in pandas dataframe Python "TypeError: unhashable type: 'slice'" for encoding categorical data Seaborn Barplot - Displaying Values ValueError: Wrong number of items passed - Meaning and suggestions? How to get row number in dataframe in Pandas? How to install pandas from pip on windows cmd? Pandas convert string to int Convert list into a pandas data frame Use .corr to get the correlation between two columns Why isn't this code to plot a histogram on a continuous value Pandas column working? How to add title to seaborn boxplot ValueError: Length of values does not match length of index | Pandas DataFrame.unique() How to save a new sheet in an existing excel file, using Pandas? matplotlib: plot multiple columns of pandas data frame on the bar chart Convert List to Pandas Dataframe Column TypeError: 'DataFrame' object is not callable Set order of columns in pandas dataframe Python Pandas - Missing required dependencies ['numpy'] 1

Questions with dataframe tag:

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe Pandas create empty DataFrame with only column names Python: pandas merge multiple dataframes Spark dataframe: collect () vs select () 'DataFrame' object has no attribute 'sort' Remove Unnamed columns in pandas dataframe Convert float64 column to int64 in Pandas Python Pandas iterate over rows and access column names Display rows with one or more NaN values in pandas dataframe ValueError: Length of values does not match length of index | Pandas DataFrame.unique() Convert List to Pandas Dataframe Column Pandas Split Dataframe into two Dataframes at a specific row Pandas dataframe groupby plot Removing space from dataframe columns in pandas Get total of Pandas column Python - How to convert JSON File to Dataframe Strip / trim all strings of a dataframe Merge two dataframes by index pandas how to check dtype for all columns in a dataframe? Joining Spark dataframes on the key Provide schema while reading csv file as a dataframe Pandas group-by and sum PySpark 2.0 The size or shape of a DataFrame How to concatenate multiple column values into a single column in Panda dataframe Convert Pandas DataFrame to JSON format pandas dataframe convert column type to string or categorical How to add multiple columns to pandas dataframe in one assignment? Fetching distinct values on a column using Spark DataFrame How to Add Incremental Numbers to a New Column Using Pandas Pandas KeyError: value not in index How to split data into 3 sets (train, validation and test)? Split / Explode a column of dictionaries into separate columns with pandas Group dataframe and get sum AND count? Save Dataframe to csv directly to s3 Python Pandas dataframe fillna() only some columns in place how to sort pandas dataframe from one column PySpark: multiple conditions in when clause What is dtype('O'), in pandas? Filter Pyspark dataframe column with None value Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() How to create a DataFrame from a text file in Spark

Questions with group-by tag:

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by Count unique values using pandas groupby Pandas group-by and sum Count unique values with pandas per groups Group dataframe and get sum AND count? Error related to only_full_group_by when executing a query in MySql Pandas sum by groupby, but exclude certain columns Using DISTINCT along with GROUP BY in SQL Server Python Pandas : group by in group by and average? How do I create a new column from the output of pandas groupby().sum()? pandas groupby sort within groups Relative frequencies / proportions with dplyr MongoDB SELECT COUNT GROUP BY Trying to use INNER JOIN and GROUP BY SQL with SUM Function, Not Working Group By Eloquent ORM Conversion failed when converting the varchar value 'simple, ' to data type int SQL Sum Multiple rows into one Select multiple columns from a table, but group by one How does Subquery in select statement work in oracle GROUP BY without aggregate function MySQL Nested Select Query? GROUP BY + CASE statement must appear in the GROUP BY clause or be used in an aggregate function Renaming Column Names in Pandas Groupby function Get statistics for each group (such as count, mean, etc) using pandas GroupBy? Naming returned columns in Pandas aggregate function? SQL: Group by minimum value in one field while selecting distinct rows Using LINQ to group a list of objects pandas GroupBy columns with NaN (missing) values Column "invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" Pandas count(distinct) equivalent GroupBy pandas DataFrame and select most common value GROUP BY to combine/concat a column Pandas sort by group aggregate and column MySQL - sum column value(s) based on row from the same table MySQL order by before group by How to access pandas groupby dataframe by key Apply multiple functions to multiple groupby columns Most efficient method to groupby on an array of objects Linq Select Group By Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause How to get multiple counts with one SQL query? Count multiple columns with group by in one query getting "No column was specified for column 2 of 'd'" in sql server cte? SQL to Entity Framework Count Group-By Linq select to new object SQL query with avg and group by LINQ Group By and select collection JOIN two SELECT statement results Using ORDER BY and GROUP BY together

Questions with pandas-groupby tag:

Count unique values with pandas per groups Group dataframe and get sum AND count? How do I create a new column from the output of pandas groupby().sum()? How to loop over grouped Pandas dataframe? Concatenate strings from several rows using Pandas groupby pandas dataframe groupby datetime month How to group dataframe rows into list in pandas groupby Renaming Column Names in Pandas Groupby function Get statistics for each group (such as count, mean, etc) using pandas GroupBy? pandas GroupBy columns with NaN (missing) values Get the row(s) which have the max value in groups using groupby GroupBy pandas DataFrame and select most common value How to access pandas groupby dataframe by key Multiple aggregations of the same column using pandas GroupBy.agg() Converting a Pandas GroupBy output from Series to DataFrame