[python] Plotting histograms from grouped data in a pandas DataFrame

I need some guidance in working out how to plot a block of histograms from grouped data in a pandas dataframe. Here's an example to illustrate my question:

from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')

In my ignorance I tried this code command:

df.groupby('Letter').hist()

which failed with the error message "TypeError: cannot concatenate 'str' and 'float' objects"

Any help most appreciated.

This question is related to python pandas histogram

The answer is


With recent version of Pandas, you can do df.N.hist(by=df.Letter)

Just like with the solutions above, the axes will be different for each subplot. I have not solved that one yet.


Your function is failing because the groupby dataframe you end up with has a hierarchical index and two columns (Letter and N) so when you do .hist() it's trying to make a histogram of both columns hence the str error.

This is the default behavior of pandas plotting functions (one plot per column) so if you reshape your data frame so that each letter is a column you will get exactly what you want.

df.reset_index().pivot('index','Letter','N').hist()

The reset_index() is just to shove the current index into a column called index. Then pivot will take your data frame, collect all of the values N for each Letter and make them a column. The resulting data frame as 400 rows (fills missing values with NaN) and three columns (A, B, C). hist() will then produce one histogram per column and you get format the plots as needed.


One solution is to use matplotlib histogram directly on each grouped data frame. You can loop through the groups obtained in a loop. Each group is a dataframe. And you can create a histogram for each one.

from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')

for group in grouped:
  figure()
  matplotlib.pyplot.hist(group[1].N)
  show()

I write this answer because I was looking for a way to plot together the histograms of different groups. What follows is not very smart, but it works fine for me. I use Numpy to compute the histogram and Bokeh for plotting. I think it is self-explanatory, but feel free to ask for clarifications and I'll be happy to add details (and write it better).

figures = {
    'Transit': figure(title='Transit', x_axis_label='speed [km/h]', y_axis_label='frequency'),
    'Driving': figure(title='Driving', x_axis_label='speed [km/h]', y_axis_label='frequency')
}

cols = {'Vienna': 'red', 'Turin': 'blue', 'Rome': 'Orange'}
for gr in df_trips.groupby(['locality', 'means']):
    locality = gr[0][0]
    means = gr[0][1]
    fig = figures[means]
    h, b = np.histogram(pd.DataFrame(gr[1]).speed.values)
    fig.vbar(x=b[1:], top=h, width=(b[1]-b[0]), legend_label=locality, fill_color=cols[locality], alpha=0.5)

show(gridplot([
    [figures['Transit']],
    [figures['Driving']],
]))

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to histogram

Why isn't this code to plot a histogram on a continuous value Pandas column working? Make Frequency Histogram for Factor Variables Overlay normal curve to histogram in R Plotting histograms from grouped data in a pandas DataFrame save a pandas.Series histogram plot to file changing default x range in histogram matplotlib How does numpy.histogram() work? Fitting a histogram with python Bin size in Matplotlib (Histogram) Plot two histograms on single chart with matplotlib