Understanding generators in Python

Question

I am reading the Python cookbook at the moment and am currently looking at generators  I m finding it hard to get my head round   As I come from a Java background  is there a Java equivalent  The book was speaking about  Producer   Consumer   however when I hear that I think of threading   What is a generator and why would you use it  Without quoting any books  obviously  unless you can find a decent  simplistic answer direct from a book   Perhaps with examples  if you re feeling generous

User · Answer

There is no Java equivalent   Here is a bit of a contrived example       usr bin python def  mygen n       x   0     while x  lt  n          x   x   1         if x   3    0              yield x  for a in mygen 100       print a   There is a loop in the generator that runs from 0 to n  and if the loop variable is a multiple of 3  it yields the variable   During each iteration of the for loop the generator is executed  If it is the first time the generator executes  it starts at the beginning  otherwise it continues from the previous time it yielded

User · Answer

A generator is effectively a function that returns  data  before it is finished  but it pauses at that point  and you can resume the function at that point    gt  gt  gt  def myGenerator            yield  These          yield  words          yield  come          yield  one          yield  at          yield  a          yield  time    gt  gt  gt  myGeneratorInstance   myGenerator    gt  gt  gt  next myGeneratorInstance  These  gt  gt  gt  next myGeneratorInstance  words   and so on   The  or one  benefit of generators is that because they deal with data one piece at a time  you can deal with large amounts of data  with lists  excessive memory requirements could become a problem    Generators  just like lists  are iterable  so they can be used in the same ways    gt  gt  gt  for word in myGeneratorInstance          print word These words come one at  a  time   Note that generators provide another way to deal with infinity  for example   gt  gt  gt  from time import gmtime  strftime  gt  gt  gt  def myGen            while True              yield strftime   a   d  b  Y  H  M  S  0000   gmtime         gt  gt  gt  myGeneratorInstance   myGen    gt  gt  gt  next myGeneratorInstance  Thu  28 Jun 2001 14 17 15  0000  gt  gt  gt  next myGeneratorInstance  Thu  28 Jun 2001 14 18 02  0000      The generator encapsulates an infinite loop  but this isn t a problem because you only get each answer every time you ask for it

User · Answer

I believe the first appearance of iterators and generators were in the Icon programming language  about 20 years ago   You may enjoy the Icon overview  which lets you wrap your head around them without concentrating on the syntax  since Icon is a language you probably don t know  and Griswold was explaining the benefits of his language to people coming from other languages    After reading just a few paragraphs there  the utility of generators and iterators might become more apparent

User · Answer

Experience with list comprehensions has shown their widespread utility throughout Python  However  many of the use cases do not need to have a full list created in memory  Instead  they only need to iterate over the elements one at a time   For instance  the following summation code will build a full list of squares in memory  iterate over those values  and  when the reference is no longer needed  delete the list   sum  x x for x in range 10     Memory is conserved by using a generator expression instead   sum x x for x in range 10    Similar benefits are conferred on constructors for container objects   s   Set word  for line in page  for word in line split    d   dict   k  func k   for k in keylist    Generator expressions are especially useful with functions like sum    min    and max   that reduce an iterable input to a single value   max len line   for line in file  if line strip      more

User · Answer

I like to describe generators  to those with a decent background in programming languages and computing  in terms of stack frames   In many languages  there is a stack on top of which is the current stack  frame    The stack frame includes space allocated for variables local to the function including the arguments passed in to that function     When you call a function  the current point of execution  the  program counter  or equivalent  is pushed onto the stack  and a new stack frame is created   Execution then transfers to the beginning of the function being called   With regular functions  at some point the function returns a value  and the stack is  popped    The function s stack frame is discarded and execution resumes at the previous location   When a function is a generator  it can return a value without the stack frame being discarded  using the yield statement   The values of local variables and the program counter within the function are preserved   This allows the generator to be resumed at a later time  with execution continuing from the yield statement  and it can execute more code and return another value   Before Python 2 5 this was all generators did   Python 2 5 added the ability to pass values back in to the generator as well   In doing so  the passed-in value is available as an expression resulting from the yield statement which had temporarily returned control  and a value  from the generator   The key advantage to generators is that the  state  of the function is preserved  unlike with regular functions where each time the stack frame is discarded  you lose all that  state    A secondary advantage is that some of the function call overhead  creating and deleting stack frames  is avoided  though this is a usually a minor advantage

User · Answer

This post will use Fibonacci numbers as a tool to build up to explaining the usefulness of Python generators   This post will feature both C   and Python code   Fibonacci numbers are defined as the sequence  0  1  1  2  3  5  8  13  21  34        Or in general   F0   0 F1   1 Fn   Fn-1   Fn-2   This can be transferred into a C   function extremely easily   size t Fib size t n          Fib 0    0     if n    0          return 0         Fib 1    1     if n    1          return 1         Fib N    Fib N-2    Fib N-1      return Fib n-2    Fib n-1       But if you want to print the first six Fibonacci numbers  you will be recalculating a lot of the values with the above function   For example  Fib 3    Fib 2    Fib 1   but Fib 2  also recalculates Fib 1    The higher the value you want to calculate  the worse off you will be   So one may be tempted to rewrite the above by keeping track of the state in main      Not supported for the first two elements of Fib size t GetNextFib size t  amp pp  size t  amp p        int result   pp   p      pp   p      p   result      return result     int main int argc  char  argv          size t pp   0      size t p   1      std  cout  lt  lt   0    lt  lt   1        for size t i   0  i  lt   4    i                size t fibI   GetNextFib pp  p           std  cout  lt  lt  fibI  lt  lt                 return 0      But this is very ugly  and it complicates our logic in main  It would be better to not have to worry about state in our main function   We could return a vector of values and use an iterator to iterate over that set of values  but this requires a lot of memory all at once for a large number of return values   So back to our old approach  what happens if we wanted to do something else besides print the numbers  We d have to copy and paste the whole block of code in main and change the output statements to whatever else we wanted to do  And if you copy and paste code  then you should be shot   You don t want to get shot  do you   To solve these problems  and to avoid getting shot  we may rewrite this block of code using a callback function   Every time a new Fibonacci number is encountered  we would call the callback function   void GetFibNumbers size t max  void  FoundNewFibCallback  size t         if max--    0  return      FoundNewFibCallback 0       if max--    0  return      FoundNewFibCallback 1        size t pp   0      size t p   1      for                   if max--    0  return          int result   pp   p          pp   p          p   result          FoundNewFibCallback result            void foundNewFib size t fibI        std  cout  lt  lt  fibI  lt  lt          int main int argc  char  argv          GetFibNumbers 6  foundNewFib       return 0      This is clearly an improvement  your logic in main is not as cluttered  and you can do anything you want with the Fibonacci numbers  simply define new callbacks   But this is still not perfect  What if you wanted to only get the first two Fibonacci numbers  and then do something  then get some more  then do something else   Well  we could go on like we have been  and we could start adding state again into main  allowing GetFibNumbers to start from an arbitrary point  But this will further bloat our code  and it already looks too big for a simple task like printing Fibonacci numbers   We could implement a producer and consumer model via a couple of threads  But this complicates the code even more   Instead let s talk about generators   Python has a very nice language feature that solves problems like these called generators   A generator allows you to execute a function  stop at an arbitrary point  and then continue again where you left off  Each time returning a value   Consider the following code that uses a generator   def fib        pp  p   0  1     while 1          yield pp         pp  p   p  pp p  g   fib   for i in range 6       g next     Which gives us the results      0   1   1   2   3   5   The yield statement is used in conjuction with Python generators   It saves the state of the function and returns the yeilded value   The next time you call the next   function on the generator  it will continue where the yield left off   This is by far more clean than the callback function code  We have cleaner code  smaller code  and not to mention much more functional code  Python allows arbitrarily large integers    Source

User · Answer

Note  this post assumes Python 3 x syntax  dagger   A generator is simply a function which returns an object on which you can call next  such that for every call it returns some value  until it raises a StopIteration exception  signaling that all values have been generated  Such an object is called an iterator   Normal functions return a single value using return  just like in Java  In Python  however  there is an alternative  called yield  Using yield anywhere in a function makes it a generator  Observe this code    gt  gt  gt  def myGen n           yield n         yield n   1       gt  gt  gt  g   myGen 6   gt  gt  gt  next g  6  gt  gt  gt  next g  7  gt  gt  gt  next g  Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  StopIteration   As you can see  myGen n  is a function which yields n and n   1  Every call to next yields a single value  until all values have been yielded  for loops call next in the background  thus    gt  gt  gt  for n in myGen 6           print n       6 7   Likewise there are generator expressions  which provide a means to succinctly describe certain common types of generators    gt  gt  gt  g    n for n in range 3  5    gt  gt  gt  next g  3  gt  gt  gt  next g  4  gt  gt  gt  next g  Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  StopIteration   Note that generator expressions are much like list comprehensions    gt  gt  gt  lc    n for n in range 3  5    gt  gt  gt  lc  3  4    Observe that a generator object is generated once  but its code is not run all at once  Only calls to next actually execute  part of  the code  Execution of the code in a generator stops once a yield statement has been reached  upon which it returns a value  The next call to next then causes execution to continue in the state in which the generator was left after the last yield  This is a fundamental difference with regular functions  those always start execution at the  top  and discard their state upon returning a value   There are more things to be said about this subject  It is e g  possible to send data back into a generator  reference   But that is something I suggest you do not look into until you understand the basic concept of a generator   Now you may ask  why use generators  There are a couple of good reasons    Certain concepts can be described much more succinctly using generators  Instead of creating a function which returns a list of values  one can write a generator which generates the values on the fly  This means that no list needs to be constructed  meaning that the resulting code is more memory efficient  In this way one can even describe data streams which would simply be too large to fit in memory  Generators allow for a natural way to describe infinite streams  Consider for example the Fibonacci numbers    gt  gt  gt  def fib            a  b   0  1         while True              yield a             a  b   b  a   b       gt  gt  gt  import itertools  gt  gt  gt  list itertools islice fib    10    0  1  1  2  3  5  8  13  21  34    This code uses itertools islice to take a finite number of elements from an infinite stream  You are advised to have a good look at the functions in the itertools module  as they are essential tools for writing advanced generators with great ease       nbsp  nbsp  dagger  About Python  lt  2 6  in the above examples next is a function which calls the method   next   on the given object  In Python  lt  2 6 one uses a slightly different technique  namely o next   instead of next o   Python 2 7 has next   call  next so you need not use the following in 2 7    gt  gt  gt  g    n for n in range 3  5    gt  gt  gt  g next   3

User · Answer

Performance difference  macOS Big Sur 11 1 MacBook Pro  13-inch  M1  2020  Chip Apple M1 Memory 8gb  CASE 1 import random import psutil   pip install psutil import os from datetime import datetime   def memory usage psutil          return the memory usage in MB     process   psutil Process os getpid        mem   process memory info   rss   float 2    20      return     2f  MB  format mem    names     John    Milovan    Adam    Steve    Rick    Thomas   majors     Math    Engineering    CompSci    Arts    Business    print  Memory  Before       format memory usage psutil       def people list num people       result          for i in range num people           person                  id   i               name   random choice names                major   random choice majors                    result append person      return result   t1   datetime now   people   people list 1000000  t2   datetime now     print  Memory  After        format memory usage psutil     print  Took    Seconds  format t2 - t1    output  Memory  Before   50 38 MB Memory  After    1140 41 MB Took 0 00 01 056423 Seconds   Function which returns a list of 1 million results  At the bottom I m printing out the memory usage and the total time  Base memory usage was around 50 38 megabytes and this memory after is after I created that list of 1 million records so you can see here that it jumped up by nearly 1140 41 megabytes and it took 1 1 seconds    CASE 2 import random import psutil   pip install psutil import os from datetime import datetime  def memory usage psutil          return the memory usage in MB     process   psutil Process os getpid        mem   process memory info   rss   float 2    20      return     2f  MB  format mem    names     John    Milovan    Adam    Steve    Rick    Thomas   majors     Math    Engineering    CompSci    Arts    Business    print  Memory  Before       format memory usage psutil      def people generator num people       for i in range num people           person                  id   i               name   random choice names                major   random choice majors                    yield person   t1   datetime now   people   people generator 1000000  t2   datetime now    print  Memory  After        format memory usage psutil     print  Took    Seconds  format t2 - t1    output  Memory  Before   50 52 MB Memory  After    50 73 MB Took 0 00 00 000008 Seconds   After I ran this that the memory is almost exactly the same and that s because the generator hasn t actually done anything yet it s not holding those million values in memory it s waiting for me to grab the next one   Basically it didn t take any time because as soon as it gets to the first yield statement it stops   I think that it is generator a little bit more readable and it also gives you big performance boosts not only with execution time but with memory   As well and you can still use all of the comprehensions and this generator expression here so you don t lose anything in that area  So those are a few reasons why you would use generators and also some of the advantages that come along with that

User · Answer

It helps to make a clear distinction between the function foo  and the generator foo n    def foo n       yield n     yield n 1   foo is a function  foo 6  is a generator object   The typical way to use a generator object is in a loop   for n in foo 6       print n    The loop prints    6   7   Think of a generator as a resumable function   yield behaves like return in the sense that values that are yielded get  returned  by the generator  Unlike return  however  the next time the generator gets asked for a value  the generator s function  foo  resumes where it left off -- after the last yield statement -- and continues to run until it hits another yield statement   Behind the scenes  when you call bar foo 6  the generator object bar is defined for you to have a next attribute   You can call it yourself to retrieve values yielded from foo   next bar       Works in Python 2 6 or Python 3 x bar next       Works in Python 2 5   but is deprecated  Use next   if possible    When foo ends  and there are no more yielded values   calling next bar  throws a StopInteration error

User · Answer

First of all  the term generator originally was somewhat ill-defined in Python  leading to lots of confusion  You probably mean iterators and iterables  see here   Then in Python there are also generator functions  which return a generator object   generator objects  which are iterators  and generator expressions  which are evaluated to a generator object    According to the glossary entry for generator it seems that the official terminology is now that generator is short for  generator function   In the past the documentation defined the terms inconsistently  but fortunately this has been fixed   It might still be a good idea to be precise and avoid the term  generator  without further specification

User · Answer

Generators could be thought of as shorthand for creating an iterator   They behave like a Java Iterator   Example    gt  gt  gt  g    x for x in range 10    gt  gt  gt  g  lt generator object  lt genexpr gt  at 0x7fac1c1e6aa0 gt   gt  gt  gt  g next   0  gt  gt  gt  g next   1  gt  gt  gt  g next   2  gt  gt  gt  list g      force iterating the rest  3  4  5  6  7  8  9   gt  gt  gt  g next      iterator is at the end  calling next again will throw Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  StopIteration   Hope this helps is what you are looking for   Update   As many other answers are showing  there are different ways to create a generator   You can use the parentheses syntax as in my example above  or you can use yield   Another interesting feature is that generators can be  infinite  -- iterators that don t stop    gt  gt  gt  def infinite gen            n   0         while True              yield n             n   n   1       gt  gt  gt  g   infinite gen    gt  gt  gt  g next   0  gt  gt  gt  g next   1  gt  gt  gt  g next   2  gt  gt  gt  g next   3

User · Answer

The only thing I can add to Stephan202 s answer is a recommendation that you take a look at David Beazley s PyCon  08 presentation  Generator Tricks for Systems Programmers   which is the best single explanation of the how and why of generators that I ve seen anywhere   This is the thing that took me from  Python looks kind of fun  to  This is what I ve been looking for    It s at http   www dabeaz com generators

User · Answer

I put up this piece of code which explains 3 key concepts about generators   def numbers        for i in range 10               yield i  gen   numbers    this line only returns a generator object  it does not run the code defined inside numbers  for i in gen   we iterate over the generator and the values are printed     print i    the generator is now empty  for i in gen   so this for block does not print anything     print i

[python] Understanding generators in Python

Examples related to python

Examples related to generator