[performance] Is recursion ever faster than looping?

I know that recursion is sometimes a lot cleaner than looping, and I'm not asking anything about when I should use recursion over iteration, I know there are lots of questions about that already.

What I'm asking is, is recursion ever faster than a loop? To me it seems like, you would always be able to refine a loop and get it to perform more quickly than a recursive function because the loop is absent constantly setting up new stack frames.

I'm specifically looking for whether recursion is faster in applications where recursion is the right way to handle the data, such as in some sorting functions, in binary trees, etc.

This question is related to performance loops recursion iteration

The answer is


Recursion may well be faster where the alternative is to explicitly manage a stack, like in the sorting or binary tree algorithms you mention.

I've had a case where rewriting a recursive algorithm in Java made it slower.

So the right approach is to first write it in the most natural way, only optimize if profiling shows it is critical, and then measure the supposed improvement.


Most of the answers here are wrong. The right answer is it depends. For example, here are two C functions which walks through a tree. First the recursive one:

static
void mm_scan_black(mm_rc *m, ptr p) {
    SET_COL(p, COL_BLACK);
    P_FOR_EACH_CHILD(p, {
        INC_RC(p_child);
        if (GET_COL(p_child) != COL_BLACK) {
            mm_scan_black(m, p_child);
        }
    });
}

And here is the same function implemented using iteration:

static
void mm_scan_black(mm_rc *m, ptr p) {
    stack *st = m->black_stack;
    SET_COL(p, COL_BLACK);
    st_push(st, p);
    while (st->used != 0) {
        p = st_pop(st);
        P_FOR_EACH_CHILD(p, {
            INC_RC(p_child);
            if (GET_COL(p_child) != COL_BLACK) {
                SET_COL(p_child, COL_BLACK);
                st_push(st, p_child);
            }
        });
    }
}

It's not important to understand the details of the code. Just that p are nodes and that P_FOR_EACH_CHILD does the walking. In the iterative version we need an explicit stack st onto which nodes are pushed and then popped and manipulated.

The recursive function runs much faster than the iterative one. The reason is because in the latter, for each item, a CALL to the function st_push is needed and then another to st_pop.

In the former, you only have the recursive CALL for each node.

Plus, accessing variables on the callstack is incredibly fast. It means you are reading from memory which is likely to always be in the innermost cache. An explicit stack, on the other hand, has to be backed by malloc:ed memory from the heap which is much slower to access.

With careful optimization, such as inlining st_push and st_pop, I can reach roughly parity with the recursive approach. But at least on my computer, the cost of accessing heap memory is bigger than the cost of the recursive call.

But this discussion is mostly moot because recursive tree walking is incorrect. If you have a large enough tree, you will run out of callstack space which is why an iterative algorithm must be used.


According to theory its the same things. Recursion and loop with the same O() complexity will work with the same theoretical speed, but of course real speed depends on language, compiler and processor. Example with power of number can be coded in iteration way with O(ln(n)):

  int power(int t, int k) {
  int res = 1;
  while (k) {
    if (k & 1) res *= t;
    t *= t;
    k >>= 1;
  }
  return res;
  }

This is a guess. Generally recursion probably doesn't beat looping often or ever on problems of decent size if both are using really good algorithms(not counting implementation difficulty) , it may be different if used with a language w/ tail call recursion(and a tail recursive algorithm and with loops also as part of the language)-which would probably have very similar and possibly even prefer recursion some of the time.


Tail recursion is as fast as looping. Many functional languages have tail recursion implemented in them.


is recursion ever faster than a loop?

No, Iteration will always be faster than Recursion. (in a Von Neumann Architecture)

Explanation:

If you build the minimum operations of a generic computer from scratch, "Iteration" comes first as a building block and is less resource intensive than "recursion", ergo is faster.

Building a pseudo-computing-machine from scratch:

Question yourself: What do you need to compute a value, i.e. to follow an algorithm and reach a result?

We will establish a hierarchy of concepts, starting from scratch and defining in first place the basic, core concepts, then build second level concepts with those, and so on.

  1. First Concept: Memory cells, storage, State. To do something you need places to store final and intermediate result values. Let’s assume we have an infinite array of "integer" cells, called Memory, M[0..Infinite].

  2. Instructions: do something - transform a cell, change its value. alter state. Every interesting instruction performs a transformation. Basic instructions are:

    a) Set & move memory cells

    • store a value into memory, e.g.: store 5 m[4]
    • copy a value to another position: e.g.: store m[4] m[8]

    b) Logic and arithmetic

    • and, or, xor, not
    • add, sub, mul, div. e.g. add m[7] m[8]
  3. An Executing Agent: a core in a modern CPU. An "agent" is something that can execute instructions. An Agent can also be a person following the algorithm on paper.

  4. Order of steps: a sequence of instructions: i.e.: do this first, do this after, etc. An imperative sequence of instructions. Even one line expressions are "an imperative sequence of instructions". If you have an expression with a specific "order of evaluation" then you have steps. It means than even a single composed expression has implicit “steps” and also has an implicit local variable (let’s call it “result”). e.g.:

    4 + 3 * 2 - 5
    (- (+ (* 3 2) 4 ) 5)
    (sub (add (mul 3 2) 4 ) 5)  
    

    The expression above implies 3 steps with an implicit "result" variable.

    // pseudocode
    
           1. result = (mul 3 2)
           2. result = (add 4 result)
           3. result = (sub result 5)
    

    So even infix expressions, since you have a specific order of evaluation, are an imperative sequence of instructions. The expression implies a sequence of operations to be made in a specific order, and because there are steps, there is also an implicit "result" intermediate variable.

  5. Instruction Pointer: If you have a sequence of steps, you have also an implicit "instruction pointer". The instruction pointer marks the next instruction, and advances after the instruction is read but before the instruction is executed.

    In this pseudo-computing-machine, the Instruction Pointer is part of Memory. (Note: Normally the Instruction Pointer will be a “special register” in a CPU core, but here we will simplify the concepts and assume all data (registers included) are part of “Memory”)

  6. Jump - Once you have an ordered number of steps and an Instruction Pointer, you can apply the "store" instruction to alter the value of the Instruction Pointer itself. We will call this specific use of the store instruction with a new name: Jump. We use a new name because is easier to think about it as a new concept. By altering the instruction pointer we're instructing the agent to “go to step x“.

  7. Infinite Iteration: By jumping back, now you can make the agent "repeat" a certain number of steps. At this point we have infinite Iteration.

                       1. mov 1000 m[30]
                       2. sub m[30] 1
                       3. jmp-to 2  // infinite loop
    
  8. Conditional - Conditional execution of instructions. With the "conditional" clause, you can conditionally execute one of several instructions based on the current state (which can be set with a previous instruction).

  9. Proper Iteration: Now with the conditional clause, we can escape the infinite loop of the jump back instruction. We have now a conditional loop and then proper Iteration

    1. mov 1000 m[30]
    2. sub m[30] 1
    3. (if not-zero) jump 2  // jump only if the previous 
                            // sub instruction did not result in 0
    
    // this loop will be repeated 1000 times
    // here we have proper ***iteration***, a conditional loop.
    
  10. Naming: giving names to a specific memory location holding data or holding a step. This is just a "convenience" to have. We do not add any new instructions by having the capacity to define “names” for memory locations. “Naming” is not a instruction for the agent, it’s just a convenience to us. Naming makes code (at this point) easier to read and easier to change.

       #define counter m[30]   // name a memory location
       mov 1000 counter
    loop:                      // name a instruction pointer location
        sub counter 1
        (if not-zero) jmp-to loop  
    
  11. One-level subroutine: Suppose there’s a series of steps you need to execute frequently. You can store the steps in a named position in memory and then jump to that position when you need to execute them (call). At the end of the sequence you'll need to return to the point of calling to continue execution. With this mechanism, you’re creating new instructions (subroutines) by composing core instructions.

    Implementation: (no new concepts required)

    • Store the current Instruction Pointer in a predefined memory position
    • jump to the subroutine
    • at the end of the subroutine, you retrieve the Instruction Pointer from the predefined memory location, effectively jumping back to the following instruction of the original call

    Problem with the one-level implementation: You cannot call another subroutine from a subroutine. If you do, you'll overwrite the returning address (global variable), so you cannot nest calls.

    To have a better Implementation for subroutines: You need a STACK

  12. Stack: You define a memory space to work as a "stack", you can “push” values on the stack, and also “pop” the last “pushed” value. To implement a stack you'll need a Stack Pointer (similar to the Instruction Pointer) which points to the actual “head” of the stack. When you “push” a value, the stack pointer decrements and you store the value. When you “pop”, you get the value at the actual Stack Pointer and then the Stack Pointer is incremented.

  13. Subroutines Now that we have a stack we can implement proper subroutines allowing nested calls. The implementation is similar, but instead of storing the Instruction Pointer in a predefined memory position, we "push" the value of the IP in the stack. At the end of the subroutine, we just “pop” the value from the stack, effectively jumping back to the instruction after the original call. This implementation, having a “stack” allows calling a subroutine from another subroutine. With this implementation we can create several levels of abstraction when defining new instructions as subroutines, by using core instructions or other subroutines as building blocks.

  14. Recursion: What happens when a subroutine calls itself?. This is called "recursion".

    Problem: Overwriting the local intermediate results a subroutine can be storing in memory. Since you are calling/reusing the same steps, if the intermediate result are stored in predefined memory locations (global variables) they will be overwritten on the nested calls.

    Solution: To allow recursion, subroutines should store local intermediate results in the stack, therefore, on each recursive call (direct or indirect) the intermediate results are stored in different memory locations.

...

having reached recursion we stop here.

Conclusion:

In a Von Neumann Architecture, clearly "Iteration" is a simpler/basic concept than “Recursion". We have a form of "Iteration" at level 7, while "Recursion" is at level 14 of the concepts hierarchy.

Iteration will always be faster in machine code because it implies less instructions therefore less CPU cycles.

Which one is "better"?

  • You should use "iteration" when you are processing simple, sequential data structures, and everywhere a “simple loop” will do.

  • You should use "recursion" when you need to process a recursive data structure (I like to call them “Fractal Data Structures”), or when the recursive solution is clearly more “elegant”.

Advice: use the best tool for the job, but understand the inner workings of each tool in order to choose wisely.

Finally, note that you have plenty of opportunities to use recursion. You have Recursive Data Structures everywhere, you’re looking at one now: parts of the DOM supporting what you are reading are a RDS, a JSON expression is a RDS, the hierarchical file system in your computer is a RDS, i.e: you have a root directory, containing files and directories, every directory containing files and directories, every one of those directories containing files and directories...


Functional programming is more about "what" rather than "how".

The language implementors will find a way to optimize how the code works underneath, if we don't try to make it more optimized than it needs to be. Recursion can also be optimized within the languages that support tail call optimization.

What matters more from a programmer standpoint is readability and maintainability rather than optimization in the first place. Again, "premature optimization is root of all evil".


In general, no, recursion will not be faster than a loop in any realistic usage that has viable implementations in both forms. I mean, sure, you could code up loops that take forever, but there would be better ways to implement the same loop that could outperform any implementation of the same problem via recursion.

You hit the nail on the head regarding the reason; creating and destroying stack frames is more expensive than a simple jump.

However, do note that I said "has viable implementations in both forms". For things like many sorting algorithms, there tends to not be a very viable way of implementing them that doesn't effectively set up its own version of a stack, due to the spawning of child "tasks" that are inherently part of the process. Thus, recursion may be just as fast as attempting to implement the algorithm via looping.

Edit: This answer is assuming non-functional languages, where most basic data types are mutable. It does not apply to functional languages.


Consider what absolutely must be done for each, iteration and recursion.

  • iteration: a jump to beginning of loop
  • recursion: a jump to beginning of called function

You see that there is not much room for differences here.

(I assume recursion being a tail-call and compiler being aware of that optimization).


Most answers here forget the obvious culprit why recursion is often slower than iterative solutions. It's linked with the build up and tear down of stack frames but is not exactly that. It's generally a big difference in the storage of the auto variable for each recursion. In an iterative algorithm with a loop, the variables are often held in registers and even if they spill, they will reside in the Level 1 cache. In a recursive algorithm, all intermediary states of the variable are stored on the stack, meaning they will engender many more spills to memory. This means that even if it makes the same amount of operations, it will have a lot memory accesses in the hot loop and what makes it worse, these memory operations have a lousy reuse rate making the caches less effective.

TL;DR recursive algorithms have generally a worse cache behavior than iterative ones.


In any realistic system, no, creating a stack frame will always be more expensive than an INC and a JMP. That's why really good compilers automatically transform tail recursion into a call to the same frame, i.e. without the overhead, so you get the more readable source version and the more efficient compiled version. A really, really good compiler should even be able to transform normal recursion into tail recursion where that is possible.


Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to loops

How to increment a letter N times per iteration and store in an array? Angular 2 Cannot find control with unspecified name attribute on formArrays What is the difference between i = i + 1 and i += 1 in a 'for' loop? Prime numbers between 1 to 100 in C Programming Language Python Loop: List Index Out of Range JavaScript: Difference between .forEach() and .map() Why does using from __future__ import print_function breaks Python2-style print? Creating an array from a text file in Bash Iterate through dictionary values? C# Wait until condition is true

Examples related to recursion

List all the files and folders in a Directory with PHP recursive function Jquery Ajax beforeSend and success,error & complete Node.js - Maximum call stack size exceeded best way to get folder and file list in Javascript Recursive sub folder search and return files in a list python find all subsets that sum to a particular value jQuery - Uncaught RangeError: Maximum call stack size exceeded Find and Replace string in all files recursive using grep and sed recursion versus iteration Method to get all files within folder and subfolders that will return a list

Examples related to iteration

Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply? How to loop over grouped Pandas dataframe? How to iterate through a list of dictionaries in Jinja template? How to iterate through an ArrayList of Objects of ArrayList of Objects? Ways to iterate over a list in Java Python list iterator behavior and next(iterator) How to loop through an array containing objects and access their properties recursion versus iteration What is the perfect counterpart in Python for "while not EOF" How to iterate over a JavaScript object?