A weighted version of random choice

Question

I needed to write a weighted version of random choice  each element in the list has a different probability for being selected    This is what I came up with   def weightedChoice choices          Like random choice  but each element can have a different chance of     being selected       choices can be any iterable containing iterables with two items each      Technically  they can have more than two items  the rest will just be     ignored   The first item is the thing being chosen  the second item is     its weight   The weights can be any numeric values  what matters is the     relative differences between them              space          current   0     for choice  weight in choices          if weight  gt  0              space current    choice             current    weight     rand   random uniform 0  current      for key in sorted space keys      current            if rand  lt  key              return choice         choice   space key      return None   This function seems overly complex to me  and ugly   I m hoping everyone here can offer some suggestions on improving it or alternate ways of doing this   Efficiency isn t as important to me as code cleanliness and readability

User · Answer

Using numpy  def choice items  weights       return items np argmin  np cumsum weights    sum weights    lt  np random rand

User · Answer

Here is another version of weighted choice that uses numpy  Pass in the weights vector and it will return an array of 0 s containing a 1 indicating which bin was chosen  The code defaults to just making a single draw but you can pass in the number of draws to be made and the counts per bin drawn will be returned   If the weights vector does not sum to 1  it will be normalized so that it does    import numpy as np  def weighted choice weights  n 1       if np sum weights   1          weights   weights np sum weights       draws   np random random sample size n       weights   np cumsum weights      weights   np insert weights 0 0 0       counts   np histogram draws  bins weights      return counts 0

User · Answer

As of Python v3 6  random choices could be used to return a list of elements of specified size from the given population with optional weights      random choices population  weights None     cum weights None  k 1     population   list containing unique observations   If empty  raises IndexError  weights   More precisely relative weights required to make selections  cum weights   cumulative weights required to make selections  k   size len  of the list to be outputted   Default len   1      Few Caveats   1  It makes use of weighted sampling with replacement so the drawn items would be later replaced  The values in the weights sequence in itself do not matter  but their relative ratio does   Unlike np random choice which can only take on probabilities as weights and also which must ensure summation of individual probabilities upto 1 criteria  there are no such regulations here  As long as they belong to numeric types  int float fraction except Decimal type    these would still perform    gt  gt  gt  import random   weights being integers  gt  gt  gt  random choices   white    green    red     12  12  4   k 10    green    red    green    white    white    white    green    white    red    white     weights being floats  gt  gt  gt  random choices   white    green    red      12   12   04   k 10    white    white    green    green    red    red    white    green    white    green     weights being fractions  gt  gt  gt  random choices   white    green    red     12 100  12 100  4 100   k 10    green    green    white    red    green    red    white    green    green    green     2  If neither weights nor cum weights are specified  selections are made with equal probability   If a weights sequence is supplied  it must be the same length as the population sequence    Specifying both weights and cum weights raises a TypeError    gt  gt  gt  random choices   white    green    red    k 10    white    white    green    red    red    red    white    white    white    green     3  cum weights are typically a result of itertools accumulate function which are really handy in such situations        From the documentation linked        Internally  the relative weights are converted to cumulative weights   before making selections  so supplying the cumulative weights saves   work    So  either supplying weights  12  12  4  or cum weights  12  24  28  for our contrived case produces the same outcome and the latter seems to be more faster   efficient

User · Answer

Since version 1 7 0  NumPy has a choice function that supports probability distributions   from numpy random import choice draw   choice list of candidates  number of items to pick                p probability distribution    Note that probability distribution is a sequence in the same order of list of candidates  You can also use the keyword replace False to change the behavior so that drawn items are not replaced

User · Answer

One way is to randomize on the total of all the weights and then use the values as the limit points for each var  Here is a crude implementation as a generator   def rand weighted weights               Generator which uses the weights to generate a     weighted random values             sum weights   sum weights values        cum weights          current weight   0     for key  value in sorted weights iteritems             current weight    value         cum weights key    current weight     while True          sel   int random uniform 0  1    sum weights          for key  value in sorted cum weights iteritems                 if sel  lt  value                  break         yield key

User · Answer

I looked the pointed other thread and came up with this variation in my coding style  this returns the index of choice for purpose of tallying  but it is simple to return the string   commented return alternative    import random import bisect  try      range   xrange except      pass  def weighted choice choices       total  cumulative   0         for c w in choices          total    w         cumulative append  total  c       r   random uniform 0  total        return index     return bisect bisect cumulative   r          return item string      return choices bisect bisect cumulative   r     0     define choices and relative weights choices      WHITE  90     RED  8     GREEN  2    tally    0 for item in choices   n   100000   tally up n weighted choices for i in range n       tally weighted choice choices      1  print  t sum tally  100 for t in tally

User · Answer

Crude  but may be sufficient   import random weighted choice   lambda s   random choice sum   v  wt for v wt in s         Does it work     define choices and relative weights choices      WHITE  90     RED  8     GREEN  2      initialize tally dict tally   dict fromkeys choices  0     tally up 1000 weighted choices for i in xrange 1000       tally weighted choice choices      1  print tally items     Prints      WHITE   904     GREEN   22     RED   74     Assumes that all weights are integers   They don t have to add up to 100  I just did that to make the test results easier to interpret   If weights are floating point numbers  multiply them all by 10 repeatedly until all weights    1    weights     6   2   001   199  while any w  lt  1 0 for w in weights       weights    w 10 for w in weights  weights   map int  weights

User · Answer

Another way of doing this  assuming we have weights at the same index as the elements in the element array   import numpy as np weights    0 1  0 3  0 5   weights for the item at index 0 1 2   sum of weights should be  lt  1  you can also divide each weight by sum of all weights to standardise it to  lt  1 constraint  trials   1  number of trials num item   1  number of items that can be picked in each trial selected item arr   np random multinomial num item  weights  trials    gives number of times an item was selected at a particular index   this assumes selection with replacement   one possible output   selected item arr   array   0  0  1      say if trials   5  the the possible output could be    selected item arr   array   1  0  0        0  0  1        0  0  1        0  1  0        0  0  1      Now let s assume  we have to sample out 3 items in 1 trial  You can assume that there are three balls R G B present in large quantity in ratio of their weights given by weight array  the following could be possible outcome   num item   3 trials   1 selected item arr   np random multinomial num item  weights  trials    selected item arr can give output like     array   1  0  2      you can also think number of items to be selected as number of binomial  multinomial trials within a set  So  the above example can be still work as  num binomial trial   5 weights    0 1 0 9   say an unfair coin weights for H T num experiment set   1 selected item arr   np random multinomial num binomial trial  weights  num experiment set    possible output   selected item arr   array   1  4      i e H came 1 time and T came 4 times in 5 binomial trials  And one set contains 5 binomial trails

User · Answer

If you have a weighted dictionary instead of a list you can write this  items      a   10   b   5   c   1    random choice  k for k in items for dummy in range items k       Note that  k for k in items for dummy in range items k    produces this list   a    a    a    a    a    a    a    a    a    a    c    b    b    b    b    b

User · Answer

Arrange the weights into a cumulative distribution  Use random random   to pick a random float 0 0  lt   x  lt  total   Search the distribution using bisect bisect as shown in the example at http   docs python org dev library bisect html other-examples      from random import random from bisect import bisect  def weighted choice choices       values  weights   zip  choices      total   0     cum weights          for w in weights          total    w         cum weights append total      x   random     total     i   bisect cum weights  x      return values i    gt  gt  gt  weighted choice    WHITE  90     RED  8     GREEN  2     WHITE    If you need to make more than one choice  split this into two functions  one to build the cumulative weights and another to bisect to a random point

User · Answer

Provide random choice   with a pre-weighted list   Solution  amp  Test   import random  options     a    b    c    d   weights    1  2  5  2   weighted options     opt  wgt for opt  wgt in zip options  weights   weighted options    opt for sublist in weighted options for opt in sublist  print weighted options     test  counts    c  0 for c in options  for x in range 10000       counts random choice weighted options      1  for opt  wgt in zip options  weights       wgt r   counts opt    10000   sum weights      print opt  counts opt   wgt  wgt r    Output     a    b    b    c    c    c    c    c    d    d   a 1025 1 1 025 b 1948 2 1 948 c 5019 5 5 019 d 2008 2 2 008

User · Answer

There is lecture on this by Sebastien Thurn in the free Udacity course AI for Robotics   Basically he makes a circular array of the indexed weights using the mod operator    sets a variable beta to 0  randomly chooses an index   for loops through N where N is the number of indices and in the for loop firstly increments beta by the formula   beta   beta   uniform sample from  0   2  Weight max   and then nested in the for loop  a while loop per below   while w index   lt  beta      beta   beta - w index      index   index   1  select p index    Then on to the next index to resample based on the probabilities  or normalized probability in the case presented in the course      The lecture link    https   classroom udacity com courses cs373 lessons 48704330 concepts 487480820923   I am logged into Udacity with my school account so if the link does not work  it is Lesson 8  video number 21 of Artificial Intelligence for Robotics where he is lecturing on particle filters

User · Answer

If your list of weighted choices is relatively static  and you want frequent sampling  you can do one O N  preprocessing step  and then do the selection in O 1   using the functions in this related answer     run only when  choices  changes  preprocessed data   prep weight for   weight in choices     O 1  selection value   choices sample preprocessed data   0

User · Answer

import numpy as np w np array   0 4   0 8   1 6   0 8   0 4   np random choice w  p w sum w

User · Answer

I m probably too late to contribute anything useful  but here s a simple  short  and very efficient snippet   def choose index probabilies       cmf   probabilies 0      choice   random random       for k in xrange len probabilies            if choice  lt   cmf              return k         else              cmf    probabilies k 1    No need to sort your probabilities or create a vector with your cmf  and it terminates once it finds its choice  Memory  O 1   time  O N   with average running time   N 2    If you have weights  simply add one line   def choose index weights       probabilities   weights   sum weights      cmf   probabilies 0      choice   random random       for k in xrange len probabilies            if choice  lt   cmf              return k         else              cmf    probabilies k 1

User · Answer

If you don t mind using numpy  you can use numpy random choice    For example   import numpy  items       item1   0 2     item2   0 3     item3   0 45     item4   0 05  elems    i 0  for i in items  probs    i 1  for i in items   trials   1000 results    0    len items  for i in range trials       res   numpy random choice items  p probs    This is where the item is selected      results items index res      1 results    r   float trials  for r in results  print  item texpected tactual  for i in range len probs        print   s t 0 4f t 0 4f     items i   probs i   results i     If you know how many selections you need to make in advance  you can do it without a loop like this   numpy random choice items  trials  p probs

User · Answer

I needed to do something like this really fast really simple  from searching for ideas i finally built this template  The idea is receive the weighted values in a form of a json from the api  which here is simulated by the dict   Then translate it into a list in which each value repeats proportionally to it s weight  and just use random choice to select a value from the list   I tried it running with 10  100 and 1000 iterations  The distribution seems pretty solid   def weighted choice weighted dict          Input example  dict apples 60  oranges 30  pineapples 10         weight list          for key in weighted dict keys            weight list     key    weighted dict key      return random choice weight list

User · Answer

A very basic and easy approach for a weighted choice is the following  np random choice   A    B    C    p  0 3  0 4  0 3

User · Answer

If you happen to have Python 3  and are afraid of installing numpy or writing your own loops  you could do   import itertools  bisect  random  def weighted choice choices      weights   list zip  choices   1     return choices bisect bisect list itertools accumulate weights                                    random uniform 0  sum weights     0    Because you can build anything out of a bag of plumbing adaptors  Although    I must admit that Ned s answer  while slightly longer  is easier to understand

User · Answer

Here s is the version that is being included in the standard library for Python 3 6   import itertools as  itertools import bisect as  bisect  class Random36 random Random        Show the code included in the Python 3 6 version of the Random class       def choices self  population  weights None     cum weights None  k 1              Return a k sized list of population elements chosen with replacement           If the relative weights or cumulative weights are not specified          the selections are made with equal probability                       random   self random         if cum weights is None              if weights is None                   int   int                 total   len population                  return  population  int random     total   for i in range k               cum weights   list  itertools accumulate weights           elif weights is not None              raise TypeError  Cannot specify both weights and cumulative weights           if len cum weights     len population               raise ValueError  The number of weights does not match the population           bisect    bisect bisect         total   cum weights -1          return  population bisect cum weights  random     total   for i in range k     Source   https   hg python org cpython file tip Lib random py l340

User · Answer

I didn t love the syntax of any of those  I really wanted to just specify what the items were and what the weighting of each was  I realize I could have used random choices but instead I quickly wrote the class below   import random  string from numpy import cumsum  class randomChoiceWithProportions              Accepts a dictionary of choices as keys and weights as values  Example if you want a unfair dice        choiceWeightDic     1  0 16666666666666666   2   0 16666666666666666   3   0 16666666666666666        4   0 16666666666666666   5    06666666666666666   6   0 26666666666666666      dice   randomChoiceWithProportions choiceWeightDic       samples          for i in range 100000           samples append dice sample           Should be close to  26666     samples count  6   len samples         Should be close to  16666     samples count  1   len samples              def   init   self  choiceWeightDic           self choiceWeightDic   choiceWeightDic         weightSum   sum self choiceWeightDic values            assert weightSum    1   Weights sum to     str weightSum       not 1           self valWeightDict   self  compute valWeights        def  compute valWeights self           valWeights   list cumsum list self choiceWeightDic values              valWeightDict   dict zip list self choiceWeightDic keys     valWeights           return valWeightDict      def sample self           num   random uniform 0 1          for key  val in self valWeightDict items                if val  gt   num                  return key

User · Answer

Since Python 3 6 there is a method choices from the random module  Python 3 6 1  v3 6 1 69c0db5050  Mar 21 2017  01 21 04  Type  copyright    credits  or  license  for more information IPython 6 0 0 -- An enhanced Interactive Python  Type     for help   In  1   import random  In  2   random choices           population    a   b      b   a      c   b              weights  0 2  0 2  0 6            k 10         Out 2      c    b       c    b       b    a       c    b       c    b       b    a       c    b       b    a       c    b       c    b     Note that random choices will sample with replacement  per the docs   Return a k sized list of elements chosen from the population with replacement   Note for completeness of answer   When a sampling unit is drawn from a finite population and is returned to that population  after its characteristic s  have been recorded  before the next unit is drawn  the sampling is said to be  quot with replacement quot   It basically means each element may be chosen more than once   If you need to sample without replacement  then as  ronan-paix  o s brilliant answer states  you can use numpy choice  whose replace argument controls such behaviour

User · Answer

def weighted choice choices      total   sum w for c  w in choices     r   random uniform 0  total     upto   0    for c  w in choices        if upto   w  gt   r           return c       upto    w    assert False   Shouldn t get here

User · Answer

A general solution   import random def weighted choice choices  weights       total   sum weights      treshold   random uniform 0  total      for k  weight in enumerate weights           total -  weight         if total  lt  treshold              return choices k

User · Answer

It depends on how many times you want to sample the distribution    Suppose you want to sample the distribution K times  Then  the time complexity using np random choice   each time is O K n   log n    when n is the number of items in the distribution    In my case  I needed to sample the same distribution multiple times of the order of 10 3 where n is of the order of 10 6  I used the below code  which precomputes the cumulative distribution and samples it in O log n    Overall time complexity is O n K log n     import numpy as np  n k   10  6 10  3    Create dummy distribution a   np array  i 1 for i in range n    p   np array  1 0 n  n   cfd   p cumsum   for   in range k       x   np random uniform       idx   cfd searchsorted x  side  right       sampled element   a idx

[python] A weighted version of random.choice

Examples related to python

Examples related to optimization