Histogram using gnuplot

Question

I know how to create a histogram  just use  with boxes   in gnuplot if my  dat file already has properly binned data  Is there a way to take a list of numbers and have gnuplot provide a histogram based on ranges and bin sizes the user provides

User · Answer

Different number of bins on the same dataset can reveal different features of the data.

Unfortunately, there is no universal best method that can determine the number of bins.

One of the powerful methods is the Freedman–Diaconis rule, which automatically determines the number of bins based on statistics of a given dataset, among many other alternatives.

Accordingly, the following can be used to utilise the Freedman–Diaconis rule in a gnuplot script:

Say you have a file containing a single column of samples, samplesFile:

# samples
0.12345
1.23232
...

The following (which is based on ChrisW's answer) may be embed into an existing gnuplot script:

...
## preceeding gnuplot commands
...

#
samples="$samplesFile"
stats samples nooutput
N = floor(STATS_records)
samplesMin = STATS_min
samplesMax = STATS_max
# Freedman–Diaconis formula for bin-width size estimation
    lowQuartile = STATS_lo_quartile
    upQuartile = STATS_up_quartile
    IQR = upQuartile - lowQuartile
    width = 2*IQR/(N**(1.0/3.0))
    bin(x) = width*(floor((x-samplesMin)/width)+0.5) + samplesMin

plot \
    samples u (bin(\$1)):(1.0/(N*width)) t "Output" w l lw 1 smooth freq

User · Answer

With respect to binning functions  I didn t expect the result of the functions offered so far  Namely  if my binwidth is 0 001  these functions were centering the bins on 0 0005 points  whereas I feel it s more intuitive to have the bins centered on 0 001 boundaries   In other words  I d like to have  Bin 0 001 contain data from 0 0005 to 0 0014 Bin 0 002 contain data from 0 0015 to 0 0024       The binning function I came up with is  my bin x width        width  floor x width 0 5     Here s a script to compare some of the offered bin functions to this one   rint x     x-int x  gt 0 9999  int x  1 int x  bin x width           width rint x width    width 2 0 binc x width          width  int x width  0 5  mitar bin x width     width floor x width    width 2 0 my bin x width        width  floor x width 0 5    binwidth   0 001  data list    -0 1386 -0 1383 -0 1375 -0 0015 -0 0005 0 0005 0 0015 0 1375 0 1383 0 1386   my line   sprintf   7s   7s   7s   7s   7s   data   bin     binc     mitar     my bin     print my line do for  i in data list        iN   i   0     my line   sprintf     4f     4f     4f     4f     4f  iN bin iN binwidth  binc iN binwidth  mitar bin iN binwidth  my bin iN binwidth       print my line     and here s the output     data    bin     binc    mitar    my bin   -0 1386  -0 1375  -0 1375  -0 1385  -0 1390 -0 1383  -0 1375  -0 1375  -0 1385  -0 1380 -0 1375  -0 1365  -0 1365  -0 1375  -0 1380 -0 0015  -0 0005  -0 0005  -0 0015  -0 0010 -0 0005   0 0005   0 0005  -0 0005   0 0000  0 0005   0 0005   0 0005   0 0005   0 0010  0 0015   0 0015   0 0015   0 0015   0 0020  0 1375   0 1375   0 1375   0 1375   0 1380  0 1383   0 1385   0 1385   0 1385   0 1380  0 1386   0 1385   0 1385   0 1385   0 1390

User · Answer

I have a couple corrections additions to Born2Smile s very useful answer    Empty bins caused the box for the adjacent bin to incorrectly extend into its space  avoid this using set boxwidth binwidth In Born2Smile s version  bins are rendered as centered on their lower bound   Strictly they ought to extend from the lower bound to the upper bound   This can be corrected by modifying the bin function  bin x width  width floor x width    width 2 0

User · Answer

Be very careful  all of the answers on this page are implicitly taking the decision of where the binning starts - the left-hand edge of the left-most bin  if you like - out of the user s hands  If the user is combining any of these functions for binning data with his her own decision about where binning starts  as is done on the blog which is linked to above  the functions above are all incorrect  With an arbitrary starting point for binning  Min   the correct function is   bin x    width  floor  x-Min  width  0 5    Min   You can see why this is correct sequentially  it helps to draw a few bins and a point somewhere in one of them   Subtract Min from your data point to see how far into the binning range it is  Then divide by binwidth so that you re effectively working in units of  bins   Then  floor  the result to go to the left-hand edge of that bin  add 0 5 to go to the middle of the bin  multiply by the width so that you re no longer working in units of bins but in an absolute scale again  then finally add back on the Min offset you subtracted at the start   Consider this function in action   Min   0 25   where binning starts Max   2 25   where binning ends n   2   the number of bins width    Max-Min  n   binwidth  evaluates to 1 0 bin x    width  floor  x-Min  width  0 5    Min   e g  the value 1 1 truly falls in the left bin    this function correctly maps it to the centre of the left bin  0 75   Born2Smile s answer  bin x  width floor x width   incorrectly maps it to 1  mas90 s answer  bin x  width floor x width    binwidth 2 0  incorrectly maps it to 1 5    Born2Smile s answer is only correct if the bin boundaries occur at  n 0 5  binwidth  where n runs over integers   mas90 s answer is only correct if the bin boundaries occur at n binwidth

User · Answer

I have a little modification to Born2Smile s solution    I know that doesn t make much sense  but you may want it just in case  If your data is integer and you need a float bin size  maybe for comparison with another set of data  or plot density in finer grid   you will need to add a random number between 0 and 1 inside floor  Otherwise  there will be spikes due to round up error  floor x width 0 5  will not do because it will create pattern that s not true to original data   binwidth 0 3 bin x width  width floor x width rand 0

User · Answer

yes  and its quick and simple though very hidden   binwidth 5 bin x width  width floor x width   plot  datafile  using  bin  1 binwidth    1 0  smooth freq with boxes  check out help smooth freq to see why the above makes a histogram  to deal with ranges just set the xrange variable

User · Answer

As usual  Gnuplot is a fantastic tool for plotting sweet looking graphs and it can be made to perform all sorts of calculations   However  it is intended to plot data rather than to serve as a calculator and it is often easier to use an external programme  e g  Octave  to do the more  complicated  calculations  save this data in a file  then use Gnuplot to produce the graph   For the above problem  check out the  hist  function is Octave using  freq bins  hist data   then plot this in Gnuplot using   set style histogram rowstacked gap 0 set style fill solid 0 5 border lt -1 plot    data dat  smooth freq with boxes

User · Answer

I have found this discussion extremely useful  but I have experienced some  rounding off  problems   More precisely  using a binwidth of 0 05  I have noticed that  with the techniques presented here above  data points which read 0 1 and 0 15 fall in the same bin  This  obviously unwanted behaviour  is most likely due to the  floor  function   Hereafter is my small contribution to try to circumvent this   bin x width n  x lt  n width  width  n-1    0 5 binwidth bin x width n 1  binwidth   0 05 set boxwidth binwidth plot  data dat  u  bin  1 binwidth 1    1 0  smooth freq with boxes   This recursive method is for x   0  one could generalise this with more conditional statements to obtain something even more general

User · Answer

We do not need to use recursive method  it may be slow  My solution is using a user-defined function rint instesd of instrinsic function int or floor    rint x   x-int x  gt 0 9999  int x  1 int x    This function will give rint 0 0003 0 0001  3  while int 0 0003 0 0001  floor 0 0003 0 0001  2   Why  Please look at Perl int function and padding zeros

User · Answer

Do you want to plot a graph like this one   yes  Then you can have a look at my blog article  http   gnuplot-surprising blogspot com 2011 09 statistic-analysis-and-histogram html  Key lines from the code   n 100  number of intervals max 3   max value min -3   min value width  max-min  n  interval width  function used to map a value to the intervals hist x width  width floor x width  width 2 0 set boxwidth width 0 9 set style fill solid 0 5   fill style   count and plot plot  data dat  u  hist  1 width    1 0  smooth freq w boxes lc rgb green  notitle

[gnuplot] Histogram using gnuplot?

Examples related to gnuplot

Examples related to histogram

Examples related to binning