Python maximum recursion depth exceeded while calling a Python object

Question

I ve built a crawler that had to run on about 5M pages  by increasing the url ID  and then parses the pages which contain the info  I need   after using an algorithm which run on the urls  200K  and saved the good and bad results I found that the I m wasting a lot of time  I could see that there are a a few returning subtrahends which I can use to check the next valid url   you can see the subtrahends quite fast  a little ex  of the few first  good IDs   -  510000011    8 510000029    18 510000037    8 510000045    8 510000052    7 510000060    8 510000078    18 510000086    8 510000094    8 510000102    8 510000110   etc  510000128 510000136 510000144 510000151 510000169 510000177 510000185 510000193 510000201   after crawling about 200K urls which gave me only 14K good results I knew I was wasting my time and need to optimize it  so I run some statistics and built a function that will check the urls while increasing the id with 8 18 17 8  top returning subtrahends   etc    this is the function -   def checkNextID ID       global numOfRuns  curRes  lastResult     while ID  lt  lastResult          try              numOfRuns    1             if numOfRuns   10    0                  time sleep 3    sleep every 10 iterations             if isValid ID   8                   parseHTML curRes                  checkNextID ID   8                  return 0             if isValid ID   18                   parseHTML curRes                  checkNextID ID   18                  return 0             if isValid ID   7                   parseHTML curRes                  checkNextID ID   7                  return 0             if isValid ID   17                   parseHTML curRes                  checkNextID ID   17                  return 0             if isValid ID 6                   parseHTML curRes                  checkNextID ID   6                  return 0             if isValid ID   16                   parseHTML curRes                  checkNextID ID   16                  return 0             else                  checkNextID ID   1                  return 0         except Exception  e              print  somethin went wrong      str e    what is basically does is -checkNextID ID  is getting the first id I know that contain the data minus 8 so the first iteration will match the first  if isValid  clause  isValid ID   8  will return True    lastResult is a variable which saves the last known url id  so we ll run until numOfRuns is  isValid   is a function that gets an ID   one of the subtrahends and returns True if the url contains what I need and saves a soup object of the url to a global varibale named -  curRes   it returns False if the url doesn t contain the data I need   parseHTML is a function that gets the soup object  curRes   parses the data I need and then saves the data to a csv  then returns True   if isValid   returns True  we ll call parseHTML   and then try to check the next ID the subtrahends  by calling checkNextID ID   subtrahends   if none of them will return what I m looking for I ll increase it with 1 and check again until I ll find the next valid url   you can see the rest of the code here  after running the code I got about 950  good results and suddenly an exception had raised -      somethin went wrong  maximum recursion depth exceeded while calling a   Python object    I could see on WireShark that the scipt stuck on id - 510009541  I started my script with 510000003   the script tried getting the url with that ID a few times before I noticed the error and stopped it   I was really exciting to see that I got the same results but 25x-40x times faster then my old script  with fewer HTTP requests  it s very precise  I have missed only 1 result for 1000 good results  which is find by me  it s impossible to rum 5M times  I had my old script running for 30 hours and got 14-15K results when my new script gave me 960  results in 5-10 minutes   I read about stack limitations  but there must be a solution for the algorithm I m trying to implement in Python  I can t go back to my old  algorithm   it will never end    Thanks

User · Answer

Instead of doing recursion  the parts of the code with checkNextID ID   18  and similar could be replaced with ID  18  and then if you remove all instances of return 0  then it should do the same thing but as a simple loop  You should then put a return 0 at the end and make your variables non-global

User · Answer

this turns the recursion in to a loop   def checkNextID ID       global numOfRuns  curRes  lastResult     while ID  lt  lastResult          try              numOfRuns    1             if numOfRuns   10    0                  time sleep 3    sleep every 10 iterations             if isValid ID   8                   parseHTML curRes                  ID   ID   8             elif isValid ID   18                   parseHTML curRes                  ID   ID   18             elif isValid ID   7                   parseHTML curRes                  ID   ID   7             elif isValid ID   17                   parseHTML curRes                  ID   ID   17             elif isValid ID 6                   parseHTML curRes                  ID   ID   6             elif isValid ID   16                   parseHTML curRes                  ID   ID   16             else                  ID   ID   1         except Exception  e              print  somethin went wrong      str e

User · Answer

Python don t have a great support for recursion because of it s lack of TRE  Tail Recursion Elimination    This means that each call to your recursive function will create a function call stack and because there is a limit of stack depth  by default is 1000  that you can check out by sys getrecursionlimit  of course you can change it using sys setrecursionlimit but it s not recommended  your program will end up by crashing when it hits this limit   As other answer has already give you a much nicer way for how to solve this in your case  which is to replace recursion by simple loop  there is another solution if you still want to use recursion which is to use one of the many recipes of implementing TRE in python like this one   N B  My answer is meant to give you more insight on why you get the error  and I m not advising you to use the TRE as i already explained because in your case a loop will be much better and easy to read

User · Answer

You can increase the capacity of the stack by the following    import sys sys setrecursionlimit 10000

[python] Python: maximum recursion depth exceeded while calling a Python object

Examples related to python

Examples related to algorithm

Examples related to recursion

Examples related to web-crawler

Examples related to depth