Why is 1000000000000000 in range 1000000000000001 so fast in Python 3

Question

It is my understanding that the range   function  which is actually an object type in Python 3  generates its contents on the fly  similar to a generator    This being the case  I would have expected the following line to take an inordinate amount of time  because in order to determine whether 1 quadrillion is in the range  a quadrillion values would have to be generated    1000000000000000 in range 1000000000000001    Furthermore  it seems that no matter how many zeroes I add on  the calculation more or less takes the same amount of time  basically instantaneous     I have also tried things like this  but the calculation is still almost instant    1000000000000000000000 in range 0 1000000000000000000001 10    count by tens   If I try to implement my own range function  the result is not so nice     def my crappy range N       i   0     while i  lt  N          yield i         i    1     return   What is the range   object doing under the hood that makes it so fast      Martijn Pieters  answer was chosen for its completeness  but also see abarnert s first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3  and some information warning regarding potential inconsistency for   contains   function optimization across Python implementations  abarnert s other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3  and lack of optimization of xrange in Python 2   Answers by poke and by wim provide the relevant C source code and explanations for those who are interested

User · Answer

Use the source  Luke   In CPython  range        contains    a method wrapper  will eventually delegate to a simple calculation which checks if the value can possibly be in the range   The reason for the speed here is we re using mathematical reasoning about the bounds  rather than a direct iteration of the range object   To explain the logic used     Check that the number is between start and stop  and Check that the stride value doesn t  step over  our number      For example  994 is in range 4  1000  2  because    4  lt   994  lt  1000  and  994 - 4    2    0    The full C code is included below  which is a bit more verbose because of memory management and reference counting details  but the basic idea is there   static int range contains long rangeobject  r  PyObject  ob        int cmp1  cmp2  cmp3      PyObject  tmp1   NULL      PyObject  tmp2   NULL      PyObject  zero   NULL      int result   -1       zero   PyLong FromLong 0       if  zero    NULL     MemoryError in int 0             goto end          Check if the value can possibly be in the range          cmp1   PyObject RichCompareBool r- gt step  zero  Py GT       if  cmp1    -1          goto end      if  cmp1    1       positive steps  start  lt   ob  lt  stop            cmp2   PyObject RichCompareBool r- gt start  ob  Py LE           cmp3   PyObject RichCompareBool ob  r- gt stop  Py LT             else      negative steps  stop  lt  ob  lt   start            cmp2   PyObject RichCompareBool ob  r- gt start  Py LE           cmp3   PyObject RichCompareBool r- gt stop  ob  Py LT              if  cmp2    -1    cmp3    -1     TypeError            goto end      if  cmp2    0    cmp3    0       ob outside of range            result   0          goto end                Check that the stride does not invalidate ob s membership         tmp1   PyNumber Subtract ob  r- gt start       if  tmp1    NULL          goto end      tmp2   PyNumber Remainder tmp1  r- gt step       if  tmp2    NULL          goto end         result     int ob  - start    step     0        result   PyObject RichCompareBool tmp2  zero  Py EQ     end      Py XDECREF tmp1       Py XDECREF tmp2       Py XDECREF zero       return result     static int range contains rangeobject  r  PyObject  ob        if  PyLong CheckExact ob     PyBool Check ob           return range contains long r  ob        return  int  PySequence IterSearch  PyObject  r  ob                                         PY ITERSEARCH CONTAINS       The  meat  of the idea is mentioned in the line      result     int ob  - start    step     0       As a final note - look at the range contains function at the bottom of the code snippet   If the exact type check fails then we don t use the clever algorithm described  instead falling back to a dumb iteration search of the range using  PySequence IterSearch   You can check this behaviour in the interpreter  I m using v3 5 0 here     gt  gt  gt  x  r   1000000000000000  range 1000000000000001   gt  gt  gt  class MyInt int           pass       gt  gt  gt  x    MyInt x   gt  gt  gt  x in r    calculates immediately     True  gt  gt  gt  x  in r    iterates for ages         Quit  core dumped

User · Answer

To add to Martijn   s answer  this is the relevant part of the source  in C  as the range object is written in native code    static int range contains rangeobject  r  PyObject  ob        if  PyLong CheckExact ob     PyBool Check ob           return range contains long r  ob        return  int  PySequence IterSearch  PyObject  r  ob                                         PY ITERSEARCH CONTAINS       So for PyLong objects  which is int in Python 3   it will use the range contains long function to determine the result  And that function essentially checks if ob is in the specified range  although it looks a bit more complex in C    If it   s not an int object  it falls back to iterating until it finds the value  or not    The whole logic could be translated to pseudo-Python like this   def range contains  rangeObj  obj       if isinstance obj  int           return range contains long rangeObj  obj         default logic by iterating     return any obj    x for x in rangeObj   def range contains long  r  num       if r step  gt  0            positive step  r start  lt   num  lt  r stop         cmp2   r start  lt   num         cmp3   num  lt  r stop     else            negative step  r start  gt   num  gt  r stop         cmp2   num  lt   r start         cmp3   r stop  lt  num        outside of the range boundaries     if not cmp2 or not cmp3          return False        num must be on a valid step inside the boundaries     return  num - r start    r step    0

User · Answer

Try x-1 in  i for i in range x   for large x values  which uses a generator comprehension to avoid invoking the range   contains   optimisation

User · Answer

The other answers explained it well already  but I d like to offer another experiment illustrating the nature of range objects    gt  gt  gt  r   range 5   gt  gt  gt  for i in r          print i  2 in r  list r    0 True  0  1  2  3  4  1 True  0  1  2  3  4  2 True  0  1  2  3  4  3 True  0  1  2  3  4  4 True  0  1  2  3  4    As you can see   a range object is an object that remembers its range and can be used many times  even while iterating over it   not just a one-time generator

User · Answer

TLDR  range is an arithmetic series so it can very easily calculate whether the object is there It could even get the index of it if it were list like really quickly

User · Answer

TL DR  The object returned by range   is actually a range object  This object implements the iterator interface so you can iterate over its values sequentially  just like a generator  list  or tuple    But it also implements the   contains   interface which is actually what gets called when an object appears on the right hand side of the in operator  The   contains     method returns a bool of whether or not the item on the left-hand-side of the in is in the object  Since range objects know their bounds and stride  this is very easy to implement in O 1

User · Answer

Due to optimization  it is very easy to compare given integers just with min and max range  The reason that range   function is so fast in Python3 is that here we use mathematical reasoning for the bounds  rather than a direct iteration of the range object  So for explaining the logic here    Check whether the number is between the start and stop  Check whether the step precision value doesn t go over our number   Take an example  997 is in range 4  1000  3  because   4  lt   997  lt  1000  and  997 - 4    3    0

User · Answer

The Python 3 range   object doesn t produce numbers immediately  it is a smart sequence object that produces numbers on demand  All it contains is your start  stop and step values  then as you iterate over the object the next integer is calculated each iteration  The object also implements the object   contains   hook  and calculates if your number is part of its range  Calculating is a  near  constant time operation    There is never a need to scan through all possible integers in the range  From the range   object documentation   The advantage of the range type over a regular list or tuple is that a range object will always take the same  small  amount of memory  no matter the size of the range it represents  as it only stores the start  stop and step values  calculating individual items and subranges as needed    So at a minimum  your range   object would do  class my range      def   init   self  start  stop None  step 1              if stop is None              start  stop   0  start         self start  self stop  self step   start  stop  step         if step  lt  0              lo  hi  step   stop  start  -step         else              lo  hi   start  stop         self length   0 if lo  gt  hi else   hi - lo - 1     step    1      def   iter   self           current   self start         if self step  lt  0              while current  gt  self stop                  yield current                 current    self step         else              while current  lt  self stop                  yield current                 current    self step      def   len   self           return self length      def   getitem   self  i           if i  lt  0              i    self length         if 0  lt   i  lt  self length              return self start   i   self step         raise IndexError  my range object index out of range        def   contains   self  num           if self step  lt  0              if not  self stop  lt  num  lt   self start                   return False         else              if not  self start  lt   num  lt  self stop                   return False         return  num - self start    self step    0  This is still missing several things that a real range   supports  such as the  index   or  count   methods  hashing  equality testing  or slicing   but should give you an idea  I also simplified the   contains   implementation to only focus on integer tests  if you give a real range   object a non-integer value  including subclasses of int   a slow scan is initiated to see if there is a match  just as if you use a containment test against a list of all the contained values  This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well  See the original Python issue that implemented the containment test     Near constant time because Python integers are unbounded and so math operations also grow in time as N grows  making this a O log N  operation  Since it   s all executed in optimised C code and Python stores integer values in 30-bit chunks  you   d run out of memory before you saw any performance impact due to the size of the integers involved here

User · Answer

It s all about a lazy approach to the evaluation and some extra optimization of range  Values in ranges don t need to be computed until real use  or even further due to extra optimization   By the way  your integer is not such big  consider sys maxsize  sys maxsize in range sys maxsize  is pretty fast  due to optimization - it s easy to compare given integer just with min and max of range   but   Decimal sys maxsize  in range sys maxsize  is pretty slow    in this case  there is no optimization in range  so if python receives unexpected Decimal  python will compare all numbers   You should be aware of an implementation detail but should not be relied upon  because this may change in the future

User · Answer

If you re wondering why this optimization was added to range   contains    and why it wasn t added to xrange   contains   in 2 7   First  as Ashwini Chaudhary discovered  issue 1766304 was opened explicitly to optimize  x range   contains    A patch for this was accepted and checked in for 3 2  but not backported to 2 7 because  xrange has behaved like this for such a long time that I don t see what it buys us to commit the patch this late    2 7 was nearly out at that point    Meanwhile   Originally  xrange was a not-quite-sequence object  As the 3 1 docs say      Range objects have very little behavior  they only support indexing  iteration  and the len function    This wasn t quite true  an xrange object actually supported a few other things that come automatically with indexing and len   including   contains    via linear search   But nobody thought it was worth making them full sequences at the time   Then  as part of implementing the Abstract Base Classes PEP  it was important to figure out which builtin types should be marked as implementing which ABCs  and xrange range claimed to implement collections Sequence  even though it still only handled the same  very little behavior   Nobody noticed that problem until issue 9213  The patch for that issue not only added index and count to 3 2 s range  it also re-worked the optimized   contains    which shares the same math with index  and is directly used by count     This change went in for 3 2 as well  and was not backported to 2 x  because  it s a bugfix that adds new methods    At this point  2 7 was already past rc status    So  there were two chances to get this optimization backported to 2 7  but they were both rejected       In fact  you even get iteration for free with indexing alone  but in 2 3 xrange objects got a custom iterator      The first version actually reimplemented it  and got the details wrong   e g   it would give you MyIntSubclass 2  in range 5     False  But Daniel Stutzbach s updated version of the patch restored most of the previous code  including the fallback to the generic  slow  PySequence IterSearch that pre-3 2 range   contains   was implicitly using when the optimization doesn t apply

User · Answer

The fundamental misunderstanding here is in thinking that range is a generator  It s not  In fact  it s not any kind of iterator   You can tell this pretty easily    gt  gt  gt  a   range 5   gt  gt  gt  print list a    0  1  2  3  4   gt  gt  gt  print list a    0  1  2  3  4    If it were a generator  iterating it once would exhaust it    gt  gt  gt  b   my crappy range 5   gt  gt  gt  print list b    0  1  2  3  4   gt  gt  gt  print list b        What range actually is  is a sequence  just like a list  You can even test this    gt  gt  gt  import collections abc  gt  gt  gt  isinstance a  collections abc Sequence  True   This means it has to follow all the rules of being a sequence    gt  gt  gt  a 3            indexable 3  gt  gt  gt  len a          sized 5  gt  gt  gt  3 in a         membership True  gt  gt  gt  reversed a     reversible  lt range iterator at 0x101cd2360 gt   gt  gt  gt  a index 3      implements  index  3  gt  gt  gt  a count 3      implements  count  1     The difference between a range and a list is that a range is a lazy or dynamic sequence  it doesn t remember all of its values  it just remembers its start  stop  and step  and creates the values on demand on   getitem      As a side note  if you print iter a    you ll notice that range uses the same listiterator type as list  How does that work  A listiterator doesn t use anything special about list except for the fact that it provides a C implementation of   getitem    so it works fine for range too      Now  there s nothing that says that Sequence   contains   has to be constant time   in fact  for obvious examples of sequences like list  it isn t  But there s nothing that says it can t be  And it s easier to implement range   contains   to just check it mathematically   val - start    step  but with some extra complexity to deal with negative steps  than to actually generate and test all the values  so why shouldn t it do it the better way   But there doesn t seem to be anything in the language that guarantees this will happen  As Ashwini Chaudhari points out  if you give it a non-integral value  instead of converting to integer and doing the mathematical test  it will fall back to iterating all the values and comparing them one by one  And just because CPython 3 2  and PyPy 3 x versions happen to contain this optimization  and it s an obvious good idea and easy to do  there s no reason that IronPython or NewKickAssPython 3 x couldn t leave it out   And in fact CPython 3 0-3 1 didn t include it      If range actually were a generator  like my crappy range  then it wouldn t make sense to test   contains   this way  or at least the way it makes sense wouldn t be obvious  If you d already iterated the first 3 values  is 1 still in the generator  Should testing for 1 cause it to iterate and consume all the values up to 1  or up to the first value  gt   1

[python] Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?

Examples related to python

Examples related to performance

Examples related to python-3.x

Examples related to range

Examples related to python-internals