Statistics combinations in Python

Question

I need to compute combinatorials  nCr  in Python but cannot find the function to do that in math  numpy or stat  libraries  Something like a function of the type   comb   calculate combinations n  r    I need the number of possible combinations  not the actual combinations  so itertools combinations does not interest me   Finally  I want to avoid using factorials  as the numbers I ll be calculating the combinations for can get too big and the factorials are going to be monstrous   This seems like a REALLY easy to answer question  however I am being drowned in questions about generating all the actual combinations  which is not what I want

User · Answer

Why not write it yourself  It s a one-liner or such   from operator import mul      or mul lambda x y x y from fractions import Fraction  def nCk n k      return int  reduce mul   Fraction n-i  i 1  for i in range k    1      Test - printing Pascal s triangle    gt  gt  gt  for n in range 17           print     join   5d  nCk n k  for k in range n 1   center 100                                                              1                                                                                                 1     1                                                                                           1     2     1                                                                                     1     3     3     1                                                                               1     4     6     4     1                                                                         1     5    10    10     5     1                                                                   1     6    15    20    15     6     1                                                             1     7    21    35    35    21     7     1                                                       1     8    28    56    70    56    28     8     1                                                 1     9    36    84   126   126    84    36     9     1                                           1    10    45   120   210   252   210   120    45    10     1                                     1    11    55   165   330   462   462   330   165    55    11     1                               1    12    66   220   495   792   924   792   495   220    66    12     1                         1    13    78   286   715  1287  1716  1716  1287   715   286    78    13     1                   1    14    91   364  1001  2002  3003  3432  3003  2002  1001   364    91    14     1             1    15   105   455  1365  3003  5005  6435  6435  5005  3003  1365   455   105    15     1        1    16   120   560  1820  4368  8008 11440 12870 11440  8008  4368  1820   560   120    16     1  gt  gt  gt     PS  edited to replace int round reduce mul   float n-i   i 1  for i in range k    1    with int reduce mul   Fraction n-i  i 1  for i in range k    1   so it won t err for big N K

User · Answer

Starting Python 3 8  the standard library now includes the math comb function to compute the binomial coefficient      math comb n  k    which is the number of ways to choose k items from n items without repetition n     k   n - k      import math math comb 10  5    252

User · Answer

Using dynamic programming  the time complexity is T n m  and space complexity T m    def binomial n  k        int  int  - gt  int             c n-1  k-1    c n-1  k   if 0  lt  k  lt  n c n k      1                        if n   k            1                        if k   0  Precondition  n  gt  k   gt  gt  gt  binomial 9  2  36      c    0     n   1  c 0    1 for i in range 1  n   1       c i    1     j   i - 1     while j  gt  0          c j     c j - 1          j -  1  return c k

User · Answer

You can write 2 simple functions that actually turns out to be about 5-8 times faster than using scipy special comb  In fact  you don t need to import any extra packages  and the function is quite easily readable  The trick is to use memoization to store previously computed values  and using the definition of nCr    create a memoization dictionary memo      def factorial n               Calculate the factorial of an input using memoization      param n  int      rtype value  int             if n in  1 0           return 1     if n in memo          return memo n      value   n factorial n-1      memo n    value     return value  def ncr n  k               Choose k elements from a set of n elements - n must be larger than or equal to k      param n  int      param k  int      rtype  int             return factorial n   factorial k  factorial n-k     If we compare times  from scipy special import comb  timeit comb 100 48   gt  gt  gt  100000 loops  best of 3  6 78   s per loop   timeit ncr 100 48   gt  gt  gt  1000000 loops  best of 3  1 39   s per loop

User · Answer

That s probably as fast as you can do it in pure python for reasonably large inputs   def choose n  k       if k    n  return 1     if k  gt  n  return 0     d  q   max k  n-k   min k  n-k      num    1     for n in xrange d 1  n 1   num    n     denom   1     for d in xrange 1  q 1   denom    d     return num   denom

User · Answer

This function is very optimized  def nCk n k       m 0     if k  0          m 1     if k  1          m n     if k gt  2          num dem op1 op2 1 1 k n         while op1 gt  1               num  op2             dem  op1             op1- 1             op2- 1         m num  dem     return m

User · Answer

The direct formula produces big integers when n is bigger than 20    So  yet another response   from math import factorial  reduce long   mul    range n-r 1  n 1   1L     factorial r    short  accurate and efficient because this avoids python big integers by sticking with longs    It is more accurate and faster when comparing to scipy special comb     gt  gt  gt  from scipy special import comb   gt  gt  gt  nCr   lambda n r  reduce long   mul    range n-r 1  n 1   1L     factorial r    gt  gt  gt  comb 128 20   1 1965669823265365e 23   gt  gt  gt  nCr 128 20   119656698232656998274400L    accurate  no loss   gt  gt  gt  from timeit import timeit   gt  gt  gt  timeit lambda  comb n r    8 231969118118286   gt  gt  gt  timeit lambda  nCr 128  20    3 885951042175293

User · Answer

A quick search on google code gives  it uses formula from  Mark Byers s answer    def choose n  k               A fast way to calculate binomial coefficients by Andrew Dalke  contrib               if 0  lt   k  lt   n          ntok   1         ktok   1         for t in xrange 1  min k  n - k    1               ntok    n             ktok    t             n -  1         return ntok    ktok     else          return 0   choose   is 10 times faster  tested on all 0  lt    n k   lt  1e3 pairs  than scipy misc comb   if you need an exact answer   def comb N k     from scipy comb    but MODIFIED      if  k  gt  N  or  N  lt  0  or  k  lt  0           return 0L     N k   map long  N k       top   N     val   1L     while  top  gt   N-k            val    top         top -  1     n   1L     while  n  lt  k 1L           val    n         n    1     return val

User · Answer

See scipy special comb  scipy misc comb in older versions of scipy   When exact is False  it uses the gammaln function to obtain good precision without taking much time  In the exact case it returns an arbitrary-precision integer  which might take a long time to compute

User · Answer

If you want an exact result  use sympy binomial  It seems to be the fastest method  hands down   x   1000000 y   234050   timeit scipy misc comb x  y  exact True  1 loops  best of 3  1min 27s per loop   timeit gmpy comb x  y  1 loops  best of 3  1 97 s per loop   timeit int sympy binomial x  y   100000 loops  best of 3  5 06   s per loop

User · Answer

Here is an efficient algorithm for you for i   1     r     p   p     n - i     i  print p   For example nCr 30 7    fact 30      fact 7    fact 23        30   29   28   27   26   25   24      1   2   3   4   5   6   7  So just run the loop from 1 to r can get the result   In python  n r 5 2 p n for i in range 1 r      p   p  n - i  i else     p   p  i 1  print p

User · Answer

Here s another alternative  This one was originally written in C    so it can be backported to C   for a finite-precision integer  e g    int64   The advantage is  1  it involves only integer operations  and  2  it avoids bloating the integer value by doing successive pairs of multiplication and division  I ve tested the result with Nas Banov s Pascal triangle  it gets the correct answer   def choose n r        Computes n     r   n-r    exactly  Returns a python long int       assert n  gt   0   assert 0  lt   r  lt   n    c   1L   denom   1   for  num denom  in zip xrange n n-r -1   xrange 1 r 1 1        c    c   num     denom   return c   Rationale  To minimize the   of multiplications and divisions  we rewrite the expression as      n       n n-1     n-r 1  ---------   ----------------  r  n-r            r    To avoid multiplication overflow as much as possible  we will evaluate in the following STRICT order  from left to right   n   1    n-1    2    n-2    3          n-r 1    r   We can show that integer arithmatic operated in this order is exact  i e  no roundoff error

User · Answer

A literal translation of the mathematical definition is quite adequate in a lot of cases  remembering that Python will automatically use big number arithmetic    from math import factorial  def calculate combinations n  r       return factorial n     factorial r     factorial n-r    For some inputs I tested  e g  n 1000 r 500  this was more than 10 times faster than the one liner reduce suggested in another  currently highest voted  answer  On the other hand  it is out-performed by the snippit provided by  J F  Sebastian

User · Answer

If you want exact results and speed  try gmpy -- gmpy comb should do exactly what you ask for  and it s pretty fast  of course  as gmpy s original author  I am biased -

User · Answer

Using only standard library distributed with Python   import itertools  def nCk n  k       return len list itertools combinations range n   k

User · Answer

If your program has an upper bound to n  say n  lt   N  and needs to repeatedly compute nCr  preferably for   N times   using lru cache can give you a huge performance boost   from functools import lru cache   lru cache maxsize None  def nCr n  r       return 1 if r    0 or r    n else nCr n - 1  r - 1    nCr n - 1  r    Constructing the cache  which is done implicitly  takes up to O N 2  time  Any subsequent calls to nCr will return in O 1

User · Answer

It s pretty easy with sympy   import sympy  comb   sympy binomial n  r

User · Answer

This is  killerT2333 code using the builtin memoization decorator   from functools import lru cache   lru cache   def factorial n               Calculate the factorial of an input using memoization      param n  int      rtype value  int             return 1 if n in  1  0  else n   factorial n-1    lru cache   def ncr n  k               Choose k elements from a set of n elements      n must be greater than or equal to k       param n  int      param k  int      rtype  int             return factorial n     factorial k    factorial n - k    print ncr 6  3

[python] Statistics: combinations in Python

Examples related to python

Examples related to statistics

Examples related to combinations