[algorithm] Best algorithm for detecting cycles in a directed graph

What is the most efficient algorithm for detecting all cycles within a directed graph?

I have a directed graph representing a schedule of jobs that need to be executed, a job being a node and a dependency being an edge. I need to detect the error case of a cycle within this graph leading to cyclic dependencies.

This question is related to algorithm graph-theory directed-graph

The answer is


As you said, you have set of jobs, it need to be executed in certain order. Topological sort given you required order for scheduling of jobs(or for dependency problems if it is a direct acyclic graph). Run dfs and maintain a list, and start adding node in the beginning of the list, and if you encountered a node which is already visited. Then you found a cycle in given graph.


If DFS finds an edge that points to an already-visited vertex, you have a cycle there.


I had implemented this problem in sml ( imperative programming) . Here is the outline . Find all the nodes that either have an indegree or outdegree of 0 . Such nodes cannot be part of a cycle ( so remove them ) . Next remove all the incoming or outgoing edges from such nodes. Recursively apply this process to the resulting graph. If at the end you are not left with any node or edge , the graph does not have any cycles , else it has.


There is no algorithm which can find all the cycles in a directed graph in polynomial time. Suppose, the directed graph has n nodes and every pair of the nodes has connections to each other which means you have a complete graph. So any non-empty subset of these n nodes indicates a cycle and there are 2^n-1 number of such subsets. So no polynomial time algorithm exists. So suppose you have an efficient (non-stupid) algorithm which can tell you the number of directed cycles in a graph, you can first find the strong connected components, then applying your algorithm on these connected components. Since cycles only exist within the components and not between them.


According to Lemma 22.11 of Cormen et al., Introduction to Algorithms (CLRS):

A directed graph G is acyclic if and only if a depth-first search of G yields no back edges.

This has been mentioned in several answers; here I'll also provide a code example based on chapter 22 of CLRS. The example graph is illustrated below.

enter image description here

CLRS' pseudo-code for depth-first search reads:

enter image description here

In the example in CLRS Figure 22.4, the graph consists of two DFS trees: one consisting of nodes u, v, x, and y, and the other of nodes w and z. Each tree contains one back edge: one from x to v and another from z to z (a self-loop).

The key realization is that a back edge is encountered when, in the DFS-VISIT function, while iterating over the neighbors v of u, a node is encountered with the GRAY color.

The following Python code is an adaptation of CLRS' pseudocode with an if clause added which detects cycles:

import collections


class Graph(object):
    def __init__(self, edges):
        self.edges = edges
        self.adj = Graph._build_adjacency_list(edges)

    @staticmethod
    def _build_adjacency_list(edges):
        adj = collections.defaultdict(list)
        for edge in edges:
            adj[edge[0]].append(edge[1])
        return adj


def dfs(G):
    discovered = set()
    finished = set()

    for u in G.adj:
        if u not in discovered and u not in finished:
            discovered, finished = dfs_visit(G, u, discovered, finished)


def dfs_visit(G, u, discovered, finished):
    discovered.add(u)

    for v in G.adj[u]:
        # Detect cycles
        if v in discovered:
            print(f"Cycle detected: found a back edge from {u} to {v}.")

        # Recurse into DFS tree
        if v not in finished:
            dfs_visit(G, v, discovered, finished)

    discovered.remove(u)
    finished.add(u)

    return discovered, finished


if __name__ == "__main__":
    G = Graph([
        ('u', 'v'),
        ('u', 'x'),
        ('v', 'y'),
        ('w', 'y'),
        ('w', 'z'),
        ('x', 'v'),
        ('y', 'x'),
        ('z', 'z')])

    dfs(G)

Note that in this example, the time in CLRS' pseudocode is not captured because we're only interested in detecting cycles. There is also some boilerplate code for building the adjacency list representation of a graph from a list of edges.

When this script is executed, it prints the following output:

Cycle detected: found a back edge from x to v.
Cycle detected: found a back edge from z to z.

These are exactly the back edges in the example in CLRS Figure 22.4.


If a graph satisfy this property

|e| > |v| - 1

then the graph contains at least on cycle.


Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.


If you can't add a "visited" property to the nodes, use a set (or map) and just add all visited nodes to the set unless they are already in the set. Use a unique key or the address of the objects as the "key".

This also gives you the information about the "root" node of the cyclic dependency which will come in handy when a user has to fix the problem.

Another solution is to try to find the next dependency to execute. For this, you must have some stack where you can remember where you are now and what you need to do next. Check if a dependency is already on this stack before you execute it. If it is, you've found a cycle.

While this might seem to have a complexity of O(N*M) you must remember that the stack has a very limited depth (so N is small) and that M becomes smaller with each dependency that you can check off as "executed" plus you can stop the search when you found a leaf (so you never have to check every node -> M will be small, too).

In MetaMake, I created the graph as a list of lists and then deleted every node as I executed them which naturally cut down the search volume. I never actually had to run an independent check, it all happened automatically during normal execution.

If you need a "test only" mode, just add a "dry-run" flag which disables the execution of the actual jobs.


Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.


I had implemented this problem in sml ( imperative programming) . Here is the outline . Find all the nodes that either have an indegree or outdegree of 0 . Such nodes cannot be part of a cycle ( so remove them ) . Next remove all the incoming or outgoing edges from such nodes. Recursively apply this process to the resulting graph. If at the end you are not left with any node or edge , the graph does not have any cycles , else it has.


If you can't add a "visited" property to the nodes, use a set (or map) and just add all visited nodes to the set unless they are already in the set. Use a unique key or the address of the objects as the "key".

This also gives you the information about the "root" node of the cyclic dependency which will come in handy when a user has to fix the problem.

Another solution is to try to find the next dependency to execute. For this, you must have some stack where you can remember where you are now and what you need to do next. Check if a dependency is already on this stack before you execute it. If it is, you've found a cycle.

While this might seem to have a complexity of O(N*M) you must remember that the stack has a very limited depth (so N is small) and that M becomes smaller with each dependency that you can check off as "executed" plus you can stop the search when you found a leaf (so you never have to check every node -> M will be small, too).

In MetaMake, I created the graph as a list of lists and then deleted every node as I executed them which naturally cut down the search volume. I never actually had to run an independent check, it all happened automatically during normal execution.

If you need a "test only" mode, just add a "dry-run" flag which disables the execution of the actual jobs.


In my opinion, the most understandable algorithm for detecting cycle in a directed graph is the graph-coloring-algorithm.

Basically, the graph coloring algorithm walks the graph in a DFS manner (Depth First Search, which means that it explores a path completely before exploring another path). When it finds a back edge, it marks the graph as containing a loop.

For an in depth explanation of the graph coloring algorithm, please read this article: http://www.geeksforgeeks.org/detect-cycle-direct-graph-using-colors/

Also, I provide an implementation of graph coloring in JavaScript https://github.com/dexcodeinc/graph_algorithm.js/blob/master/graph_algorithm.js


Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.


https://mathoverflow.net/questions/16393/finding-a-cycle-of-fixed-length I like this solution the best specially for 4 length:)

Also phys wizard says u have to do O(V^2). I believe that we need only O(V)/O(V+E). If the graph is connected then DFS will visit all nodes. If the graph has connected sub graphs then each time we run a DFS on a vertex of this sub graph we will find the connected vertices and wont have to consider these for the next run of the DFS. Therefore the possibility of running for each vertex is incorrect.


The way I do it is to do a Topological Sort, counting the number of vertices visited. If that number is less than the total number of vertices in the DAG, you have a cycle.


The way I do it is to do a Topological Sort, counting the number of vertices visited. If that number is less than the total number of vertices in the DAG, you have a cycle.


If you can't add a "visited" property to the nodes, use a set (or map) and just add all visited nodes to the set unless they are already in the set. Use a unique key or the address of the objects as the "key".

This also gives you the information about the "root" node of the cyclic dependency which will come in handy when a user has to fix the problem.

Another solution is to try to find the next dependency to execute. For this, you must have some stack where you can remember where you are now and what you need to do next. Check if a dependency is already on this stack before you execute it. If it is, you've found a cycle.

While this might seem to have a complexity of O(N*M) you must remember that the stack has a very limited depth (so N is small) and that M becomes smaller with each dependency that you can check off as "executed" plus you can stop the search when you found a leaf (so you never have to check every node -> M will be small, too).

In MetaMake, I created the graph as a list of lists and then deleted every node as I executed them which naturally cut down the search volume. I never actually had to run an independent check, it all happened automatically during normal execution.

If you need a "test only" mode, just add a "dry-run" flag which disables the execution of the actual jobs.


If a graph satisfy this property

|e| > |v| - 1

then the graph contains at least on cycle.


The way I do it is to do a Topological Sort, counting the number of vertices visited. If that number is less than the total number of vertices in the DAG, you have a cycle.


If DFS finds an edge that points to an already-visited vertex, you have a cycle there.


The simplest way to do it is to do a depth first traversal (DFT) of the graph.

If the graph has n vertices, this is a O(n) time complexity algorithm. Since you will possibly have to do a DFT starting from each vertex, the total complexity becomes O(n^2).

You have to maintain a stack containing all vertices in the current depth first traversal, with its first element being the root node. If you come across an element which is already in the stack during the DFT, then you have a cycle.


Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.


In my opinion, the most understandable algorithm for detecting cycle in a directed graph is the graph-coloring-algorithm.

Basically, the graph coloring algorithm walks the graph in a DFS manner (Depth First Search, which means that it explores a path completely before exploring another path). When it finds a back edge, it marks the graph as containing a loop.

For an in depth explanation of the graph coloring algorithm, please read this article: http://www.geeksforgeeks.org/detect-cycle-direct-graph-using-colors/

Also, I provide an implementation of graph coloring in JavaScript https://github.com/dexcodeinc/graph_algorithm.js/blob/master/graph_algorithm.js


Start with a DFS: a cycle exists if and only if a back-edge is discovered during DFS. This is proved as a result of white-path theorum.


The way I do it is to do a Topological Sort, counting the number of vertices visited. If that number is less than the total number of vertices in the DAG, you have a cycle.


The simplest way to do it is to do a depth first traversal (DFT) of the graph.

If the graph has n vertices, this is a O(n) time complexity algorithm. Since you will possibly have to do a DFT starting from each vertex, the total complexity becomes O(n^2).

You have to maintain a stack containing all vertices in the current depth first traversal, with its first element being the root node. If you come across an element which is already in the stack during the DFT, then you have a cycle.


Start with a DFS: a cycle exists if and only if a back-edge is discovered during DFS. This is proved as a result of white-path theorum.


According to Lemma 22.11 of Cormen et al., Introduction to Algorithms (CLRS):

A directed graph G is acyclic if and only if a depth-first search of G yields no back edges.

This has been mentioned in several answers; here I'll also provide a code example based on chapter 22 of CLRS. The example graph is illustrated below.

enter image description here

CLRS' pseudo-code for depth-first search reads:

enter image description here

In the example in CLRS Figure 22.4, the graph consists of two DFS trees: one consisting of nodes u, v, x, and y, and the other of nodes w and z. Each tree contains one back edge: one from x to v and another from z to z (a self-loop).

The key realization is that a back edge is encountered when, in the DFS-VISIT function, while iterating over the neighbors v of u, a node is encountered with the GRAY color.

The following Python code is an adaptation of CLRS' pseudocode with an if clause added which detects cycles:

import collections


class Graph(object):
    def __init__(self, edges):
        self.edges = edges
        self.adj = Graph._build_adjacency_list(edges)

    @staticmethod
    def _build_adjacency_list(edges):
        adj = collections.defaultdict(list)
        for edge in edges:
            adj[edge[0]].append(edge[1])
        return adj


def dfs(G):
    discovered = set()
    finished = set()

    for u in G.adj:
        if u not in discovered and u not in finished:
            discovered, finished = dfs_visit(G, u, discovered, finished)


def dfs_visit(G, u, discovered, finished):
    discovered.add(u)

    for v in G.adj[u]:
        # Detect cycles
        if v in discovered:
            print(f"Cycle detected: found a back edge from {u} to {v}.")

        # Recurse into DFS tree
        if v not in finished:
            dfs_visit(G, v, discovered, finished)

    discovered.remove(u)
    finished.add(u)

    return discovered, finished


if __name__ == "__main__":
    G = Graph([
        ('u', 'v'),
        ('u', 'x'),
        ('v', 'y'),
        ('w', 'y'),
        ('w', 'z'),
        ('x', 'v'),
        ('y', 'x'),
        ('z', 'z')])

    dfs(G)

Note that in this example, the time in CLRS' pseudocode is not captured because we're only interested in detecting cycles. There is also some boilerplate code for building the adjacency list representation of a graph from a list of edges.

When this script is executed, it prints the following output:

Cycle detected: found a back edge from x to v.
Cycle detected: found a back edge from z to z.

These are exactly the back edges in the example in CLRS Figure 22.4.


https://mathoverflow.net/questions/16393/finding-a-cycle-of-fixed-length I like this solution the best specially for 4 length:)

Also phys wizard says u have to do O(V^2). I believe that we need only O(V)/O(V+E). If the graph is connected then DFS will visit all nodes. If the graph has connected sub graphs then each time we run a DFS on a vertex of this sub graph we will find the connected vertices and wont have to consider these for the next run of the DFS. Therefore the possibility of running for each vertex is incorrect.


There is no algorithm which can find all the cycles in a directed graph in polynomial time. Suppose, the directed graph has n nodes and every pair of the nodes has connections to each other which means you have a complete graph. So any non-empty subset of these n nodes indicates a cycle and there are 2^n-1 number of such subsets. So no polynomial time algorithm exists. So suppose you have an efficient (non-stupid) algorithm which can tell you the number of directed cycles in a graph, you can first find the strong connected components, then applying your algorithm on these connected components. Since cycles only exist within the components and not between them.


As you said, you have set of jobs, it need to be executed in certain order. Topological sort given you required order for scheduling of jobs(or for dependency problems if it is a direct acyclic graph). Run dfs and maintain a list, and start adding node in the beginning of the list, and if you encountered a node which is already visited. Then you found a cycle in given graph.