binning data in python with scipy numpy

Question

is there a more efficient way to take an average of an array in prespecified bins  for example  i have an array of numbers and an array corresponding to bin start and end positions in that array  and I want to just take the mean in those bins  I have code that does it below but i am wondering how it can be cut down and improved  thanks   from scipy import   from numpy import    def get bin mean a  b start  b end       ind upper   nonzero a  gt   b start  0      a upper   a ind upper      a range   a upper nonzero a upper  lt  b end  0       mean val   mean a range      return mean val   data   rand 100  bins   linspace 0  1  10  binned data       n   0 for n in range 0  len bins -1       b start   bins n      b end   bins n 1      binned data append get bin mean data  b start  b end    print binned data

User · Answer

I would add  and also to answer the question find mean bin values using histogram2d python that the scipy also have a function specially designed to compute a bidimensional binned statistic for one or more sets of data  import numpy as np from scipy stats import binned statistic 2d  x   np random rand 100  y   np random rand 100  values   np random rand 100  bin means   binned statistic 2d x  y  values  bins 10  statistic   the function scipy stats binned statistic dd is a generalization of this funcion for higher dimensions datasets

User · Answer

The numpy indexed package  disclaimer  I am its author  contains functionality to efficiently perform operations of this type   import numpy indexed as npi print npi group by np digitize data  bins   mean data     This is essentially the same solution as the one I posted earlier  but now wrapped in a nice interface  with tests and all

User · Answer

Not sure why this thread got necroed  but here is a 2014 approved answer  which should be far faster   import numpy as np  data   np random rand 100  bins   10 slices   np linspace 0  100  bins 1  True  astype np int  counts   np diff slices   mean   np add reduceat data  slices  -1     counts print mean

User · Answer

Another alternative is to use the ufunc at  This method applies in-place a desired operation at specified indices  We can get the bin position for each datapoint using the searchsorted method   Then we can use at to increment by 1 the position of histogram at the index given by bin indexes  every time we encounter an index at bin indexes     np random seed 1  data   np random random 100    100 bins   np linspace 0  100  10   histogram   np zeros like bins   bin indexes   np searchsorted bins  data  np add at histogram  bin indexes  1

User · Answer

It s probably faster and easier to use numpy digitize     import numpy data   numpy random random 100  bins   numpy linspace 0  1  10  digitized   numpy digitize data  bins  bin means    data digitized    i  mean   for i in range 1  len bins      An alternative to this is to use numpy histogram     bin means    numpy histogram data  bins  weights data  0                 numpy histogram data  bins  0     Try for yourself which one is faster

User · Answer

The Scipy    0 11  function scipy stats binned statistic specifically addresses the above question   For the same example as in the previous answers  the Scipy solution would be  import numpy as np from scipy stats import binned statistic  data   np random rand 100  bin means   binned statistic data  data  bins 10  range  0  1   0

[python] binning data in python with scipy/numpy

Examples related to python

Examples related to numpy

Examples related to scipy

Examples related to scientific-computing