[algorithm] What's the fastest algorithm for sorting a linked list?

I'm curious if O(n log n) is the best a linked list can do.

This question is related to algorithm sorting linked-list complexity-theory

The answer is


It is reasonable to expect that you cannot do any better than O(N log N) in running time.

However, the interesting part is to investigate whether you can sort it in-place, stably, its worst-case behavior and so on.

Simon Tatham, of Putty fame, explains how to sort a linked list with merge sort. He concludes with the following comments:

Like any self-respecting sort algorithm, this has running time O(N log N). Because this is Mergesort, the worst-case running time is still O(N log N); there are no pathological cases.

Auxiliary storage requirement is small and constant (i.e. a few variables within the sorting routine). Thanks to the inherently different behaviour of linked lists from arrays, this Mergesort implementation avoids the O(N) auxiliary storage cost normally associated with the algorithm.

There is also an example implementation in C that work for both singly and doubly linked lists.

As @Jørgen Fogh mentions below, big-O notation may hide some constant factors that can cause one algorithm to perform better because of memory locality, because of a low number of items, etc.


Here's an implementation that traverses the list just once, collecting runs, then schedules the merges in the same way that mergesort does.

Complexity is O(n log m) where n is the number of items and m is the number of runs. Best case is O(n) (if the data is already sorted) and worst case is O(n log n) as expected.

It requires O(log m) temporary memory; the sort is done in-place on the lists.

(updated below. commenter one makes a good point that I should describe it here)

The gist of the algorithm is:

    while list not empty
        accumulate a run from the start of the list
        merge the run with a stack of merges that simulate mergesort's recursion
    merge all remaining items on the stack

Accumulating runs doesn't require much explanation, but it's good to take the opportunity to accumulate both ascending runs and descending runs (reversed). Here it prepends items smaller than the head of the run and appends items greater than or equal to the end of the run. (Note that prepending should use strict less-than to preserve sort stability.)

It's easiest to just paste the merging code here:

    int i = 0;
    for ( ; i < stack.size(); ++i) {
        if (!stack[i])
            break;
        run = merge(run, stack[i], comp);
        stack[i] = nullptr;
    }
    if (i < stack.size()) {
        stack[i] = run;
    } else {
        stack.push_back(run);
    }

Consider sorting the list (d a g i b e c f j h) (ignoring runs). The stack states proceed as follows:

    [ ]
    [ (d) ]
    [ () (a d) ]
    [ (g), (a d) ]
    [ () () (a d g i) ]
    [ (b) () (a d g i) ]
    [ () (b e) (a d g i) ]
    [ (c) (b e) (a d g i ) ]
    [ () () () (a b c d e f g i) ]
    [ (j) () () (a b c d e f g i) ]
    [ () (h j) () (a b c d e f g i) ]

Then, finally, merge all these lists.

Note that the number of items (runs) at stack[i] is either zero or 2^i and the stack size is bounded by 1+log2(nruns). Each element is merged once per stack level, hence O(n log m) comparisons. There's a passing similarity to Timsort here, though Timsort maintains its stack using something like a Fibonacci sequence where this uses powers of two.

Accumulating runs takes advantage of any already sorted data so that best case complexity is O(n) for an already sorted list (one run). Since we're accumulating both ascending and descending runs, runs will always be at least length 2. (This reduces the maximum stack depth by at least one, paying for the cost of finding the runs in the first place.) Worst case complexity is O(n log n), as expected, for data that is highly randomized.

(Um... Second update.)

Or just see wikipedia on bottom-up mergesort.


As I know, the best sorting algorithm is O(n*log n), whatever the container - it's been proved that sorting in the broad sense of the word (mergesort/quicksort etc style) can't go lower. Using a linked list will not give you a better run time.

The only one algorithm which runs in O(n) is a "hack" algorithm which relies on counting values rather than actually sorting.


You can copy it into an array and then sort it.

  • Copying into array O(n),

  • sorting O(nlgn) (if you use a fast algorithm like merge sort ),

  • copying back to linked list O(n) if necessary,

so it is gonna be O(nlgn).

note that if you do not know the number of elements in the linked list you won't know the size of array. If you are coding in java you can use an Arraylist for example.


The question is LeetCode #148, and there are plenty of solutions offered in all major languages. Mine is as follows, but I'm wondering about the time complexity. In order to find the middle element, we traverse the complete list each time. First time n elements are iterated over, second time 2 * n/2 elements are iterated over, so on and so forth. It seems to be O(n^2) time.

def sort(linked_list: LinkedList[int]) -> LinkedList[int]:
    # Return n // 2 element
    def middle(head: LinkedList[int]) -> LinkedList[int]:
        if not head or not head.next:
            return head
        slow = head
        fast = head.next

        while fast and fast.next:
            slow = slow.next
            fast = fast.next.next

        return slow

    def merge(head1: LinkedList[int], head2: LinkedList[int]) -> LinkedList[int]:
        p1 = head1
        p2 = head2
        prev = head = None

        while p1 and p2:
            smaller = p1 if p1.val < p2.val else p2
            if not head:
                head = smaller
            if prev:
                prev.next = smaller
            prev = smaller

            if smaller == p1:
                p1 = p1.next
            else:
                p2 = p2.next

        if prev:
            prev.next = p1 or p2
        else:
            head = p1 or p2

        return head

    def merge_sort(head: LinkedList[int]) -> LinkedList[int]:
        if head and head.next:
            mid = middle(head)
            mid_next = mid.next
            # Makes it easier to stop
            mid.next = None

            return merge(merge_sort(head), merge_sort(mid_next))
        else:
            return head

    return merge_sort(linked_list)

Comparison sorts (i.e. ones based on comparing elements) cannot possibly be faster than n log n. It doesn't matter what the underlying data structure is. See Wikipedia.

Other kinds of sort that take advantage of there being lots of identical elements in the list (such as the counting sort), or some expected distribution of elements in the list, are faster, though I can't think of any that work particularly well on a linked list.


This is a nice little paper on this topic. His empirical conclusion is that Treesort is best, followed by Quicksort and Mergesort. Sediment sort, bubble sort, selection sort perform very badly.

A COMPARATIVE STUDY OF LINKED LIST SORTING ALGORITHMS by Ching-Kuang Shene

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.9981


Depending on a number of factors, it may actually be faster to copy the list to an array and then use a Quicksort.

The reason this might be faster is that an array has much better cache performance than a linked list. If the nodes in the list are dispersed in memory, you may be generating cache misses all over the place. Then again, if the array is large you will get cache misses anyway.

Mergesort parallelises better, so it may be a better choice if that is what you want. It is also much faster if you perform it directly on the linked list.

Since both algorithms run in O(n * log n), making an informed decision would involve profiling them both on the machine you would like to run them on.

--- EDIT

I decided to test my hypothesis and wrote a C-program which measured the time (using clock()) taken to sort a linked list of ints. I tried with a linked list where each node was allocated with malloc() and a linked list where the nodes were laid out linearly in an array, so the cache performance would be better. I compared these with the built-in qsort, which included copying everything from a fragmented list to an array and copying the result back again. Each algorithm was run on the same 10 data sets and the results were averaged.

These are the results:

N = 1000:

Fragmented list with merge sort: 0.000000 seconds

Array with qsort: 0.000000 seconds

Packed list with merge sort: 0.000000 seconds

N = 100000:

Fragmented list with merge sort: 0.039000 seconds

Array with qsort: 0.025000 seconds

Packed list with merge sort: 0.009000 seconds

N = 1000000:

Fragmented list with merge sort: 1.162000 seconds

Array with qsort: 0.420000 seconds

Packed list with merge sort: 0.112000 seconds

N = 100000000:

Fragmented list with merge sort: 364.797000 seconds

Array with qsort: 61.166000 seconds

Packed list with merge sort: 16.525000 seconds

Conclusion:

At least on my machine, copying into an array is well worth it to improve the cache performance, since you rarely have a completely packed linked list in real life. It should be noted that my machine has a 2.8GHz Phenom II, but only 0.6GHz RAM, so the cache is very important.


Not a direct answer to your question, but if you use a Skip List, it is already sorted and has O(log N) search time.


Mergesort is the best you can do here.


As stated many times, the lower bound on comparison based sorting for general data is going to be O(n log n). To briefly resummarize these arguments, there are n! different ways a list can be sorted. Any sort of comparison tree that has n! (which is in O(n^n)) possible final sorts is going to need at least log(n!) as its height: this gives you a O(log(n^n)) lower bound, which is O(n log n).

So, for general data on a linked list, the best possible sort that will work on any data that can compare two objects is going to be O(n log n). However, if you have a more limited domain of things to work in, you can improve the time it takes (at least proportional to n). For instance, if you are working with integers no larger than some value, you could use Counting Sort or Radix Sort, as these use the specific objects you're sorting to reduce the complexity with proportion to n. Be careful, though, these add some other things to the complexity that you may not consider (for instance, Counting Sort and Radix sort both add in factors that are based on the size of the numbers you're sorting, O(n+k) where k is the size of largest number for Counting Sort, for instance).

Also, if you happen to have objects that have a perfect hash (or at least a hash that maps all values differently), you could try using a counting or radix sort on their hash functions.


Merge sort doesn't require O(1) access and is O ( n ln n ). No known algorithms for sorting general data are better than O ( n ln n ).

The special data algorithms such as radix sort ( limits size of data ) or histogram sort ( counts discrete data ) could sort a linked list with a lower growth function, as long as you use a different structure with O(1) access as temporary storage.

Another class of special data is a comparison sort of an almost sorted list with k elements out of order. This can be sorted in O ( kn ) operations.

Copying the list to an array and back would be O(N), so any sorting algorithm can be used if space is not an issue.

For example, given a linked list containing uint_8, this code will sort it in O(N) time using a histogram sort:

#include <stdio.h>
#include <stdint.h>
#include <malloc.h>

typedef struct _list list_t;
struct _list {
    uint8_t value;
    list_t  *next;
};


list_t* sort_list ( list_t* list )
{
    list_t* heads[257] = {0};
    list_t* tails[257] = {0};

    // O(N) loop
    for ( list_t* it = list; it != 0; it = it -> next ) {
        list_t* next = it -> next;

        if ( heads[ it -> value ] == 0 ) {
            heads[ it -> value ] = it;
        } else {
            tails[ it -> value ] -> next = it;
        }

        tails[ it -> value ] = it;
    }

    list_t* result = 0;

    // constant time loop
    for ( size_t i = 255; i-- > 0; ) {
        if ( tails[i] ) {
            tails[i] -> next = result;
            result = heads[i];
        }
    }

    return result;
}

list_t* make_list ( char* string )
{
    list_t head;

    for ( list_t* it = &head; *string; it = it -> next, ++string ) {
        it -> next = malloc ( sizeof ( list_t ) );
        it -> next -> value = ( uint8_t ) * string;
        it -> next -> next = 0;
    }

    return head.next;
}

void free_list ( list_t* list )
{
    for ( list_t* it = list; it != 0; ) {
        list_t* next = it -> next;
        free ( it );
        it = next;
    }
}

void print_list ( list_t* list )
{
    printf ( "[ " );

    if ( list ) {
        printf ( "%c", list -> value );

        for ( list_t* it = list -> next; it != 0; it = it -> next )
            printf ( ", %c", it -> value );
    }

    printf ( " ]\n" );
}


int main ( int nargs, char** args )
{
    list_t* list = make_list ( nargs > 1 ? args[1] : "wibble" );


    print_list ( list );

    list_t* sorted = sort_list ( list );


    print_list ( sorted );

    free_list ( list );
}

A Radix sort is particularly suited to a linked list, since it's easy to make a table of head pointers corresponding to each possible value of a digit.


Examples related to algorithm

How can I tell if an algorithm is efficient? Find the smallest positive integer that does not occur in a given sequence Efficiently getting all divisors of a given number Peak signal detection in realtime timeseries data What is the optimal algorithm for the game 2048? How can I sort a std::map first by value, then by key? Finding square root without using sqrt function? Fastest way to flatten / un-flatten nested JSON objects Mergesort with Python Find common substring between two strings

Examples related to sorting

Sort Array of object by object field in Angular 6 Sorting a list with stream.sorted() in Java How to sort dates from Oldest to Newest in Excel? how to sort pandas dataframe from one column Reverse a comparator in Java 8 Find the unique values in a column and then sort them pandas groupby sort within groups pandas groupby sort descending order Efficiently sorting a numpy array in descending order? Swift: Sort array of objects alphabetically

Examples related to linked-list

Creating a node class in Java Simple linked list in C++ Insert node at a certain position in a linked list C++ Printing out a linked list using toString How to work with string fields in a C struct? JTable - Selected Row click event When to use HashMap over LinkedList or ArrayList and vice-versa C: How to free nodes in the linked list? Java how to sort a Linked List? C linked list inserting node at the end

Examples related to complexity-theory

Differences between time complexity and space complexity? Determining complexity for recursive functions (Big O notation) How to find time complexity of an algorithm How can building a heap be O(n) time complexity? HashMap get/put complexity Big-oh vs big-theta Is log(n!) = T(n·log(n))? Time complexity of accessing a Python dict What are the differences between NP, NP-Complete and NP-Hard? What's the fastest algorithm for sorting a linked list?