[multithreading] Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex)

POSIX allows mutexes to be recursive. That means the same thread can lock the same mutex twice and won't deadlock. Of course it also needs to unlock it twice, otherwise no other thread can obtain the mutex. Not all systems supporting pthreads also support recursive mutexes, but if they want to be POSIX conform, they have to.

Other APIs (more high level APIs) also usually offer mutexes, often called Locks. Some systems/languages (e.g. Cocoa Objective-C) offer both, recursive and non recursive mutexes. Some languages also only offer one or the other one. E.g. in Java mutexes are always recursive (the same thread may twice "synchronize" on the same object). Depending on what other thread functionality they offer, not having recursive mutexes might be no problem, as they can easily be written yourself (I already implemented recursive mutexes myself on the basis of more simple mutex/condition operations).

What I don't really understand: What are non-recursive mutexes good for? Why would I want to have a thread deadlock if it locks the same mutex twice? Even high level languages that could avoid that (e.g. testing if this will deadlock and throwing an exception if it does) usually don't do that. They will let the thread deadlock instead.

Is this only for cases, where I accidentally lock it twice and only unlock it once and in case of a recursive mutex, it would be harder to find the problem, so instead I have it deadlock immediately to see where the incorrect lock appears? But couldn't I do the same with having a lock counter returned when unlocking and in a situation, where I'm sure I released the last lock and the counter is not zero, I can throw an exception or log the problem? Or is there any other, more useful use-case of non recursive mutexes that I fail to see? Or is it maybe just performance, as a non-recursive mutex can be slightly faster than a recursive one? However, I tested this and the difference is really not that big.

This question is related to multithreading locking mutex deadlock recursive-mutex

The answer is


IMHO, most arguments against recursive locks (which are what I use 99.9% of the time over like 20 years of concurrent programming) mix the question if they are good or bad with other software design issues, which are quite unrelated. To name one, the "callback" problem, which is elaborated on exhaustively and without any multithreading related point of view, for example in the book Component software - beyond Object oriented programming.

As soon as you have some inversion of control (e.g. events fired), you face re-entrance problems. Independent of whether there are mutexes and threading involved or not.

class EvilFoo {
  std::vector<std::string> data;
  std::vector<std::function<void(EvilFoo&)> > changedEventHandlers;
public:
  size_t registerChangedHandler( std::function<void(EvilFoo&)> handler) { // ...  
  }
  void unregisterChangedHandler(size_t handlerId) { // ...
  }
  void fireChangedEvent() { 
    // bad bad, even evil idea!
    for( auto& handler : changedEventHandlers ) {
      handler(*this);
    }
  }
  void AddItem(const std::string& item) { 
    data.push_back(item);
    fireChangedEvent();
  }
};

Now, with code like the above you get all error cases, which would usually be named in the context of recursive locks - only without any of them. An event handler can unregister itself once it has been called, which would lead to a bug in a naively written fireChangedEvent(). Or it could call other member functions of EvilFoo which cause all sorts of problems. The root cause is re-entrance. Worst of all, this could not even be very obvious as it could be over a whole chain of events firing events and eventually we are back at our EvilFoo (non- local).

So, re-entrance is the root problem, not the recursive lock. Now, if you felt more on the safe side using a non-recursive lock, how would such a bug manifest itself? In a deadlock whenever unexpected re-entrance occurs. And with a recursive lock? The same way, it would manifest itself in code without any locks.

So the evil part of EvilFoo are the events and how they are implemented, not so much a recursive lock. fireChangedEvent() would need to first create a copy of changedEventHandlers and use that for iteration, for starters.

Another aspect often coming into the discussion is the definition of what a lock is supposed to do in the first place:

  • Protect a piece of code from re-entrance
  • Protect a resource from being used concurrently (by multiple threads).

The way I do my concurrent programming, I have a mental model of the latter (protect a resource). This is the main reason why I am good with recursive locks. If some (member) function needs locking of a resource, it locks. If it calls another (member) function while doing what it does and that function also needs locking - it locks. And I don't need an "alternate approach", because the ref-counting of the recursive lock is quite the same as if each function wrote something like:

void EvilFoo::bar() {
   auto_lock lock(this); // this->lock_holder = this->lock_if_not_already_locked_by_same_thread())
   // do what we gotta do
   
   // ~auto_lock() { if (lock_holder) unlock() }
}

And once events or similar constructs (visitors?!) come into play, I do not hope to get all the ensuing design problems solved by some non-recursive lock.


The answer is not efficiency. Non-reentrant mutexes lead to better code.

Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.

Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.

The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.

If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.


As written by Dave Butenhof himself:

"The biggest of all the big problems with recursive mutexes is that they encourage you to completely lose track of your locking scheme and scope. This is deadly. Evil. It's the "thread eater". You hold locks for the absolutely shortest possible time. Period. Always. If you're calling something with a lock held simply because you don't know it's held, or because you don't know whether the callee needs the mutex, then you're holding it too long. You're aiming a shotgun at your application and pulling the trigger. You presumably started using threads to get concurrency; but you've just PREVENTED concurrency."


One main reason that recursive mutexes are useful is in case of accessing the methods multiple times by the same thread. For example, say if mutex lock is protecting a bank A/c to withdraw, then if there is a fee also associated with that withdrawal, then the same mutex has to be used.


What are non-recursive mutexes good for?

They are absolutely good when you have to make sure the mutex is unlocked before doing something. This is because pthread_mutex_unlock can guarantee that the mutex is unlocked only if it is non-recursive.

pthread_mutex_t      g_mutex;

void foo()
{
    pthread_mutex_lock(&g_mutex);
    // Do something.
    pthread_mutex_unlock(&g_mutex);

    bar();
}

If g_mutex is non-recursive, the code above is guaranteed to call bar() with the mutex unlocked.

Thus eliminating the possibility of a deadlock in case bar() happens to be an unknown external function which may well do something that may result in another thread trying to acquire the same mutex. Such scenarios are not uncommon in applications built on thread pools, and in distributed applications, where an interprocess call may spawn a new thread without the client programmer even realising that. In all such scenarios it's best to invoke the said external functions only after the lock is released.

If g_mutex was recursive, there would be simply no way to make sure it is unlocked before making a call.


The right mental model for using mutexes: The mutex protects an invariant.

Why are you sure that this is really right mental model for using mutexes? I think right model is protecting data but not invariants.

The problem of protecting invariants presents even in single-threaded applications and has nothing common with multi-threading and mutexes.

Furthermore, if you need to protect invariants, you still may use binary semaphore wich is never recursive.


As written by Dave Butenhof himself:

"The biggest of all the big problems with recursive mutexes is that they encourage you to completely lose track of your locking scheme and scope. This is deadly. Evil. It's the "thread eater". You hold locks for the absolutely shortest possible time. Period. Always. If you're calling something with a lock held simply because you don't know it's held, or because you don't know whether the callee needs the mutex, then you're holding it too long. You're aiming a shotgun at your application and pulling the trigger. You presumably started using threads to get concurrency; but you've just PREVENTED concurrency."


IMHO, most arguments against recursive locks (which are what I use 99.9% of the time over like 20 years of concurrent programming) mix the question if they are good or bad with other software design issues, which are quite unrelated. To name one, the "callback" problem, which is elaborated on exhaustively and without any multithreading related point of view, for example in the book Component software - beyond Object oriented programming.

As soon as you have some inversion of control (e.g. events fired), you face re-entrance problems. Independent of whether there are mutexes and threading involved or not.

class EvilFoo {
  std::vector<std::string> data;
  std::vector<std::function<void(EvilFoo&)> > changedEventHandlers;
public:
  size_t registerChangedHandler( std::function<void(EvilFoo&)> handler) { // ...  
  }
  void unregisterChangedHandler(size_t handlerId) { // ...
  }
  void fireChangedEvent() { 
    // bad bad, even evil idea!
    for( auto& handler : changedEventHandlers ) {
      handler(*this);
    }
  }
  void AddItem(const std::string& item) { 
    data.push_back(item);
    fireChangedEvent();
  }
};

Now, with code like the above you get all error cases, which would usually be named in the context of recursive locks - only without any of them. An event handler can unregister itself once it has been called, which would lead to a bug in a naively written fireChangedEvent(). Or it could call other member functions of EvilFoo which cause all sorts of problems. The root cause is re-entrance. Worst of all, this could not even be very obvious as it could be over a whole chain of events firing events and eventually we are back at our EvilFoo (non- local).

So, re-entrance is the root problem, not the recursive lock. Now, if you felt more on the safe side using a non-recursive lock, how would such a bug manifest itself? In a deadlock whenever unexpected re-entrance occurs. And with a recursive lock? The same way, it would manifest itself in code without any locks.

So the evil part of EvilFoo are the events and how they are implemented, not so much a recursive lock. fireChangedEvent() would need to first create a copy of changedEventHandlers and use that for iteration, for starters.

Another aspect often coming into the discussion is the definition of what a lock is supposed to do in the first place:

  • Protect a piece of code from re-entrance
  • Protect a resource from being used concurrently (by multiple threads).

The way I do my concurrent programming, I have a mental model of the latter (protect a resource). This is the main reason why I am good with recursive locks. If some (member) function needs locking of a resource, it locks. If it calls another (member) function while doing what it does and that function also needs locking - it locks. And I don't need an "alternate approach", because the ref-counting of the recursive lock is quite the same as if each function wrote something like:

void EvilFoo::bar() {
   auto_lock lock(this); // this->lock_holder = this->lock_if_not_already_locked_by_same_thread())
   // do what we gotta do
   
   // ~auto_lock() { if (lock_holder) unlock() }
}

And once events or similar constructs (visitors?!) come into play, I do not hope to get all the ensuing design problems solved by some non-recursive lock.


The only good use case for recursion mutex is when an object contains multiple methods. When any of the methods modify the content of the object, and therefore must lock the object before the state is consistent again.

If the methods use other methods (ie: addNewArray() calls addNewPoint(), and finalizes with recheckBounds()), but any of those functions by themselves need to lock the mutex, then recursive mutex is a win-win.

For any other case (solving just bad coding, using it even in different objects) is clearly wrong!


The answer is not efficiency. Non-reentrant mutexes lead to better code.

Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.

Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.

The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.

If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.


The answer is not efficiency. Non-reentrant mutexes lead to better code.

Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.

Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.

The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.

If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.


The right mental model for using mutexes: The mutex protects an invariant.

Why are you sure that this is really right mental model for using mutexes? I think right model is protecting data but not invariants.

The problem of protecting invariants presents even in single-threaded applications and has nothing common with multi-threading and mutexes.

Furthermore, if you need to protect invariants, you still may use binary semaphore wich is never recursive.


The answer is not efficiency. Non-reentrant mutexes lead to better code.

Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.

Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.

The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.

If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.


The right mental model for using mutexes: The mutex protects an invariant.

Why are you sure that this is really right mental model for using mutexes? I think right model is protecting data but not invariants.

The problem of protecting invariants presents even in single-threaded applications and has nothing common with multi-threading and mutexes.

Furthermore, if you need to protect invariants, you still may use binary semaphore wich is never recursive.


One main reason that recursive mutexes are useful is in case of accessing the methods multiple times by the same thread. For example, say if mutex lock is protecting a bank A/c to withdraw, then if there is a fee also associated with that withdrawal, then the same mutex has to be used.


What are non-recursive mutexes good for?

They are absolutely good when you have to make sure the mutex is unlocked before doing something. This is because pthread_mutex_unlock can guarantee that the mutex is unlocked only if it is non-recursive.

pthread_mutex_t      g_mutex;

void foo()
{
    pthread_mutex_lock(&g_mutex);
    // Do something.
    pthread_mutex_unlock(&g_mutex);

    bar();
}

If g_mutex is non-recursive, the code above is guaranteed to call bar() with the mutex unlocked.

Thus eliminating the possibility of a deadlock in case bar() happens to be an unknown external function which may well do something that may result in another thread trying to acquire the same mutex. Such scenarios are not uncommon in applications built on thread pools, and in distributed applications, where an interprocess call may spawn a new thread without the client programmer even realising that. In all such scenarios it's best to invoke the said external functions only after the lock is released.

If g_mutex was recursive, there would be simply no way to make sure it is unlocked before making a call.


Examples related to multithreading

How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Waiting until the task finishes What is the difference between Task.Run() and Task.Factory.StartNew() Why is setState in reactjs Async instead of Sync? What exactly is std::atomic? Calling async method on button click WAITING at sun.misc.Unsafe.park(Native Method) How to use background thread in swift? What is the use of static synchronized method in java? Locking pattern for proper use of .NET MemoryCache

Examples related to locking

How to detect query which holds the lock in Postgres? How to disable Home and other system buttons in Android? Git 'fatal: Unable to write new index file' Show all current locks from get_lock How to find out what is locking my tables? How to solve SQL Server Error 1222 i.e Unlock a SQL Server table Confused about UPDLOCK, HOLDLOCK How does lock work exactly? How to implement a lock in JavaScript LINK : fatal error LNK1104: cannot open file 'D:\...\MyProj.exe'

Examples related to mutex

What is the Swift equivalent to Objective-C's "@synchronized"? Mutex lock threads When should one use a spinlock instead of mutex? Is there a Mutex in Java? Mutex example / tutorial? When should we use mutex and when should we use semaphore Proper use of mutexes in Python Lock, mutex, semaphore... what's the difference? Example for boost shared_mutex (multiple reads/one write)? What is mutex and semaphore in Java ? What is the main difference?

Examples related to deadlock

await vs Task.Wait - Deadlock? SQL query to get the deadlocks in SQL SERVER 2008 Cause of a process being a deadlock victim How to solve SQL Server Error 1222 i.e Unlock a SQL Server table C++ terminate called without an active exception What's the difference between deadlock and livelock? How to implement a lock in JavaScript Working around MySQL error "Deadlock found when trying to get lock; try restarting transaction" How to avoid mysql 'Deadlock found when trying to get lock; try restarting transaction' Simple Deadlock Examples

Examples related to recursive-mutex

Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex)