Generate a heatmap in MatPlotLib using a scatter data set

Question

I have a set of X Y data points  about 10k  that are easy to plot as a scatter plot but that I would like to represent as a heatmap   I looked through the examples in MatPlotLib and they all seem to already start with heatmap cell values to generate the image   Is there a method that converts a bunch of x y  all different  to a heatmap  where zones with higher frequency of x y would be  warmer

User · Answer

Instead of using np hist2d  which in general produces quite ugly histograms  I would like to recycle py-sphviewer  a python package for rendering particle simulations using an adaptive smoothing kernel and that can be easily installed from pip  see webpage documentation   Consider the following code  which is based on the example  import numpy as np import numpy random import matplotlib pyplot as plt import sphviewer as sph  def myplot x  y  nb 32  xsize 500  ysize 500          xmin   np min x      xmax   np max x      ymin   np min y      ymax   np max y       x0    xmin xmax  2      y0    ymin ymax  2       pos   np zeros  len x  3       pos   0    x     pos   1    y     w   np ones len x        P   sph Particles pos  w  nb nb      S   sph Scene P      S update camera r  infinity   x x0  y y0  z 0                       xsize xsize  ysize ysize      R   sph Render S      R set logscale       img   R get image       extent   R get extent       for i  j in zip xrange 4    x0 x0 y0 y0            extent i     j     print extent     return img  extent      fig   plt figure 1  figsize  10 10   ax1   fig add subplot 221  ax2   fig add subplot 222  ax3   fig add subplot 223  ax4   fig add subplot 224      Generate some test data x   np random randn 1000  y   np random randn 1000    Plotting a regular scatter plot ax1 plot x y  k    markersize 5  ax1 set xlim -3 3  ax1 set ylim -3 3   heatmap 16  extent 16   myplot x y  nb 16  heatmap 32  extent 32   myplot x y  nb 32  heatmap 64  extent 64   myplot x y  nb 64   ax2 imshow heatmap 16  extent extent 16  origin  lower   aspect  auto   ax2 set title  quot Smoothing over 16 neighbors quot    ax3 imshow heatmap 32  extent extent 32  origin  lower   aspect  auto   ax3 set title  quot Smoothing over 32 neighbors quot     Make the heatmap using a smoothing over 64 neighbors ax4 imshow heatmap 64  extent extent 64  origin  lower   aspect  auto   ax4 set title  quot Smoothing over 64 neighbors quot    plt show    which produces the following image   As you see  the images look pretty nice  and we are able to identify different substructures on it  These images are constructed spreading a given weight for every point within a certain domain  defined by the smoothing length  which in turns is given by the distance to the closer nb neighbor  I ve chosen 16  32 and 64 for the examples   So  higher density regions typically are spread over smaller regions compared to lower density regions  The function myplot is just a very simple function that I ve written in order to give the x y data to py-sphviewer to do the magic

User · Answer

Edit  For a better approximation of Alejandro s answer  see below   I know this is an old question  but wanted to add something to Alejandro s anwser  If you want a nice smoothed image without using py-sphviewer you can instead use np histogram2d and apply a gaussian filter  from scipy ndimage filters  to the heatmap   import numpy as np import matplotlib pyplot as plt import matplotlib cm as cm from scipy ndimage filters import gaussian filter   def myplot x  y  s  bins 1000       heatmap  xedges  yedges   np histogram2d x  y  bins bins      heatmap   gaussian filter heatmap  sigma s       extent    xedges 0   xedges -1   yedges 0   yedges -1       return heatmap T  extent   fig  axs   plt subplots 2  2     Generate some test data x   np random randn 1000  y   np random randn 1000   sigmas    0  16  32  64   for ax  s in zip axs flatten    sigmas       if s    0          ax plot x  y   k    markersize 5          ax set title  Scatter plot       else          img  extent   myplot x  y  s          ax imshow img  extent extent  origin  lower   cmap cm jet          ax set title  Smoothing with    sigma     d    s   plt show     Produces     The scatter plot and s 16 plotted on top of eachother for Agape Gal lo  click for better view        One difference I noticed with my gaussian filter approach and Alejandro s  approach was that his method shows local structures much better than mine  Therefore I implemented a simple nearest neighbour method at pixel level  This method calculates for each pixel the inverse sum of the distances of the n closest points in the data  This method is at a high resolution pretty computationally expensive and I think there s a quicker way  so let me know if you have any improvements   Update  As I suspected  there s a much faster method using Scipy s scipy cKDTree  See Gabriel s answer for the implementation   Anyway  here s my code   import numpy as np import matplotlib pyplot as plt import matplotlib cm as cm   def data coord2view coord p  vlen  pmin  pmax       dp   pmax - pmin     dv    p - pmin    dp   vlen     return dv   def nearest neighbours xs  ys  reso  n neighbours       im   np zeros  reso  reso       extent    np min xs   np max xs   np min ys   np max ys        xv   data coord2view coord xs  reso  extent 0   extent 1       yv   data coord2view coord ys  reso  extent 2   extent 3       for x in range reso           for y in range reso               xp    xv - x              yp    yv - y               d   np sqrt xp  2   yp  2               im y  x    1   np sum d np argpartition d ravel    n neighbours   n neighbours         return im  extent   n   1000 xs   np random randn n  ys   np random randn n  resolution   250  fig  axes   plt subplots 2  2   for ax  neighbours in zip axes flatten     0  16  32  64        if neighbours    0          ax plot xs  ys   k    markersize 2          ax set aspect  equal           ax set title  Scatter Plot       else          im  extent   nearest neighbours xs  ys  resolution  neighbours          ax imshow im  origin  lower   extent extent  cmap cm jet          ax set title  Smoothing over  d neighbours    neighbours          ax set xlim extent 0   extent 1           ax set ylim extent 2   extent 3   plt show     Result

User · Answer

In Matplotlib lexicon  i think you want a hexbin plot    If you re not familiar with this type of plot  it s just a bivariate histogram in which the xy-plane is tessellated by a regular grid of hexagons    So from a histogram  you can just count the number of points falling in each hexagon  discretiize the plotting region as a set of windows  assign each point to one of these windows  finally  map the windows onto a color array  and you ve got a hexbin diagram    Though less commonly used than e g   circles  or squares  that hexagons are a better choice for the geometry of the binning container is intuitive    hexagons have nearest-neighbor symmetry  e g   square bins don t  e g   the distance from a point on a square s border to a point inside that square is not everywhere equal  and hexagon is the highest n-polygon that gives regular plane tessellation  i e   you can safely re-model your kitchen floor with hexagonal-shaped tiles because you won t have any void space between the tiles when you are finished--not true for all other higher-n  n    7  polygons       Matplotlib uses the term hexbin plot  so do  AFAIK  all of the plotting libraries for R  still i don t know if this is the generally accepted term for plots of this type  though i suspect it s likely given that hexbin is short for hexagonal binning  which is describes the essential step in preparing the data for display      from matplotlib import pyplot as PLT from matplotlib import cm as CM from matplotlib import mlab as ML import numpy as NP  n   1e5 x   y   NP linspace -5  5  100  X  Y   NP meshgrid x  y  Z1   ML bivariate normal X  Y  2  2  0  0  Z2   ML bivariate normal X  Y  4  1  1  1  ZD   Z2 - Z1 x   X ravel   y   Y ravel   z   ZD ravel   gridsize 30 PLT subplot 111     if  bins None   then color of each hexagon corresponds directly to its count    C  is optional--it maps values to x-y coordinates  if  C  is None  default  then    the result is a pure 2D histogram   PLT hexbin x  y  C z  gridsize gridsize  cmap CM jet  bins None  PLT axis  x min    x max    y min    y max      cb   PLT colorbar   cb set label  mean value   PLT show

User · Answer

Make a 2-dimensional array that corresponds to the cells in your final image  called say heatmap cells and instantiate it as all zeroes   Choose two scaling factors that define the difference between each array element in real units  for each dimension  say x scale and y scale  Choose these such that all your datapoints will fall within the bounds of the heatmap array   For each raw datapoint with x value and y value   heatmap cells floor x value x scale  floor y value y scale    1

User · Answer

If you don t want hexagons  you can use numpy s histogram2d function   import numpy as np import numpy random import matplotlib pyplot as plt    Generate some test data x   np random randn 8873  y   np random randn 8873   heatmap  xedges  yedges   np histogram2d x  y  bins 50  extent    xedges 0   xedges -1   yedges 0   yedges -1    plt clf   plt imshow heatmap T  extent extent  origin  lower   plt show     This makes a 50x50 heatmap  If you want  say  512x384  you can put bins  512  384  in the call to histogram2d   Example

User · Answer

Here s Jurgy s great nearest neighbour approach but implemented using scipy cKDTree  In my tests it s about 100x faster     import numpy as np import matplotlib pyplot as plt import matplotlib cm as cm from scipy spatial import cKDTree   def data coord2view coord p  resolution  pmin  pmax       dp   pmax - pmin     dv    p - pmin    dp   resolution     return dv   n   1000 xs   np random randn n  ys   np random randn n   resolution   250  extent    np min xs   np max xs   np min ys   np max ys   xv   data coord2view coord xs  resolution  extent 0   extent 1   yv   data coord2view coord ys  resolution  extent 2   extent 3     def kNN2DDens xv  yv  resolution  neighbours  dim 2                         Create the tree     tree   cKDTree np array  xv  yv   T        Find the closest nnmax-1 neighbors  first entry is the point itself      grid   np mgrid 0 resolution  0 resolution  T reshape resolution  2  dim      dists   tree query grid  neighbours        Inverse of the sum of distances to each grid point      inv sum dists   1    dists 0  sum 1         Reshape     im   inv sum dists reshape resolution  resolution      return im   fig  axes   plt subplots 2  2  figsize  15  15   for ax  neighbours in zip axes flatten     0  16  32  63         if neighbours    0          ax plot xs  ys   k    markersize 5          ax set aspect  equal           ax set title  Scatter Plot       else           im   kNN2DDens xv  yv  resolution  neighbours           ax imshow im  origin  lower   extent extent  cmap cm Blues          ax set title  Smoothing over  d neighbours    neighbours          ax set xlim extent 0   extent 1           ax set ylim extent 2   extent 3    plt savefig  new png   dpi 150  bbox inches  tight

User · Answer

Seaborn now has the jointplot function which should work nicely here   import numpy as np import seaborn as sns import matplotlib pyplot as plt    Generate some test data x   np random randn 8873  y   np random randn 8873   sns jointplot x x  y y  kind  hex   plt show

User · Answer

If you are using 1 2 x  import numpy as np import matplotlib pyplot as plt  x   np random randn 100000  y   np random randn 100000  plt hist2d x y bins 100  plt show

User · Answer

Here s one I made on a 1 Million point set with 3 categories  colored Red  Green  and Blue   Here s a link to the repository if you d like to try the function  Github Repo  histplot      X      Y      labels      bins 2000      range   -3 3   -3 3        normalize each label True      colors              1 0 0            0 1 0            0 0 1        gain 50

User · Answer

and the initial question was    how to convert scatter values to grid values  right  histogram2d does count the frequency per cell  however  if you have other data per cell than just the frequency  you d need some additional work to do   x   data x   between -10 and 4  log-gamma of an svc y   data y   between -4 and 11  log-C of an svc z   data z  between 0 and 0 78  f1-values from a difficult dataset   So  I have a dataset with Z-results for X and Y coordinates  However  I was calculating few points outside the area of interest  large gaps   and heaps of points in a small area of interest   Yes here it becomes more difficult but also more fun  Some libraries  sorry    from matplotlib import pyplot as plt from matplotlib import cm import numpy as np from scipy interpolate import griddata   pyplot is my graphic engine today  cm is a range of color maps with some initeresting choice  numpy for the calculations  and griddata for attaching values to a fixed grid   The last one is important especially because the frequency of xy points is not equally distributed in my data  First  let s start with some boundaries fitting to my data and an arbitrary grid size  The original data has datapoints also outside those x and y boundaries    determine grid boundaries gridsize   500 x min   -8 x max   2 5 y min   -2 y max   7   So we have defined a grid with 500 pixels between the min and max values of x and y   In my data  there are lots more than the 500 values available in the area of high interest  whereas in the low-interest-area  there are not even 200 values in the total grid  between the graphic boundaries of x min and x max there are even less   So for getting a nice picture  the task is to get an average for the high interest values and to fill the gaps elsewhere   I define my grid now  For each xx-yy pair  i want to have a color   xx   np linspace x min  x max  gridsize    array of x values yy   np linspace y min  y max  gridsize    array of y values grid   np array np meshgrid xx  yy T   grid   grid reshape 2  grid shape 1  grid shape 2   T   Why the strange shape  scipy griddata wants a shape of  n  D    Griddata calculates one value per point in the grid  by a predefined method  I choose  nearest  - empty grid points will be filled with values from the nearest neighbor  This looks as if the areas with less information have bigger cells  even if it is not the case   One could choose to interpolate  linear   then areas with less information look less sharp  Matter of taste  really   points   np array  x  y   T   because griddata wants it that way z grid2   griddata points  z  grid  method  nearest     you get a 1D vector as result  Reshape to picture format  z grid2   z grid2 reshape xx shape 0   yy shape 0     And hop  we hand over to matplotlib to display the plot  fig   plt figure 1  figsize  10  10   ax1   fig add subplot 111  ax1 imshow z grid2  extent  x min  x max y min  y max                  origin  lower   cmap cm magma  ax1 set title  SVC  empty spots filled by nearest neighbours   ax1 set xlabel  log gamma   ax1 set ylabel  log C   plt show     Around the pointy part of the V-Shape  you see I did a lot of calculations during my search for the sweet spot  whereas the less interesting parts almost everywhere else have a lower resolution

User · Answer

Very similar to  Piti s answer  but using 1 call instead of 2 to generate the points   import numpy as np import matplotlib pyplot as plt  pts   1000000 mean    0 0  0 0  cov     1 0 0 0   0 0 1 0    x y   np random multivariate normal mean  cov  pts  T plt hist2d x  y  bins 50  cmap plt cm jet  plt show     Output

User · Answer

I m afraid I m a little late to the party but I had a similar question a while ago  The accepted answer  by  ptomato  helped me out but I d also want to post this in case it s of use to someone         I wanted to create a heatmap resembling a football pitch which would show the different actions performed      import numpy as np import matplotlib pyplot as plt import random   fixing random state for reproducibility np random seed 1234324   fig   plt figure 12  ax1   fig add subplot 121  ax2   fig add subplot 122    Ratio of the pitch with respect to UEFA standards  hmap  np full  6  10   0   print hmap   xlist   np random uniform low 0 0  high 100 0  size  20   ylist   np random uniform low 0 0  high  100 0  size   20     UEFA Pitch Standards are 105m x 68m xlist    xlist 100  10 5 ylist    ylist 100  6 5  ax1 scatter xlist ylist    int of the co-ordinates to populate the array xlist int   xlist astype  int  ylist int   ylist astype  int    print xlist int  ylist int   for i  j in zip xlist int  ylist int        this populates the array according to the x y co-ordinate values it encounters      hmap j  i   hmap j  i    1      Reversing the rows is necessary  hmap   hmap   -1    print hmap  im   ax2 imshow hmap      Here s the result

[python] Generate a heatmap in MatPlotLib using a scatter data set

Examples related to python

Examples related to matplotlib

Examples related to heatmap

Examples related to histogram2d