[algorithm] What is stability in sorting algorithms and why is it important?

I'm very curious, why stability is or is not important in sorting algorithms?

This question is related to algorithm sorting language-agnostic stability

The answer is


I know there are many answers for this, but to me, this answer, by Robert Harvey, summarized it much more clearly:

A stable sort is one which preserves the original order of the input set, where the [unstable] algorithm does not distinguish between two or more items.

Source


Stable sort will always return same solution (permutation) on same input.

For instance [2,1,2] will be sorted using stable sort as permutation [2,1,3] (first is index 2, then index 1 then index 3 in sorted output) That mean that output is always shuffled same way. Other non stable, but still correct permutation is [2,3,1].

Quick sort is not stable sort and permutation differences among same elements depends on algorithm for picking pivot. Some implementations pick up at random and that can make quick sort yielding different permutations on same input using same algorithm.

Stable sort algorithm is necessary deterministic.


Some more examples of the reason for wanting stable sorts. Databases are a common example. Take the case of a transaction data base than includes last|first name, date|time of purchase, item number, price. Say the data base is normally sorted by date|time. Then a query is made to make a sorted copy of the data base by last|first name, since a stable sort preserves the original order, even though the inquiry compare only involves last|first name, the transactions for each last|first name will be in data|time order.

A similar example is classic Excel, which limited sorts to 3 columns at a time. To sort 6 columns, a sort is done with the least significant 3 columns, followed by a sort with the most significant 3 columns.

A classic example of a stable radix sort is a card sorter, used to sort by a field of base 10 numeric columns. The cards are sorted from least significant digit to most significant digit. On each pass, a deck of cards is read and separated into 10 different bins according to the digit in that column. Then the 10 bins of cards are put back into the input hopper in order ("0" cards first, "9" cards last). Then another pass is done by the next column, until all columns are sorted. Actual card sorters have more than 10 bins since there are 12 zones on a card, a column can be blank, and there is a mis-read bin. To sort letters, 2 passes per column are needed, 1st pass for digit, 2nd pass for the 12 11 zone.

Later (1937) there were card collating (merging) machines that could merge two decks of cards by comparing fields. The input was two already sorted decks of cards, a master deck and an update deck. The collator merged the two decks into a a new mater bin and an archive bin, which was optionally used for master duplicates so that the new master bin would only have update cards in case of duplicates. This was probably the basis for the idea behind the original (bottom up) merge sort.


Sorting stability means that records with the same key retain their relative order before and after the sort.

So stability matters if, and only if, the problem you're solving requires retention of that relative order.

If you don't need stability, you can use a fast, memory-sipping algorithm from a library, like heapsort or quicksort, and forget about it.

If you need stability, it's more complicated. Stable algorithms have higher big-O CPU and/or memory usage than unstable algorithms. So when you have a large data set, you have to pick between beating up the CPU or the memory. If you're constrained on both CPU and memory, you have a problem. A good compromise stable algorithm is a binary tree sort; the Wikipedia article has a pathetically easy C++ implementation based on the STL.

You can make an unstable algorithm into a stable one by adding the original record number as the last-place key for each record.


If you assume what you are sorting are just numbers and only their values identify/distinguish them (e.g. elements with same value are identicle), then the stability-issue of sorting is meaningless.

However, objects with same priority in sorting may be distinct, and sometime their relative order is meaningful information. In this case, unstable sort generates problems.

For example, you have a list of data which contains the time cost [T] of all players to clean a maze with Level [L] in a game. Suppose we need to rank the players by how fast they clean the maze. However, an additional rule applies: players who clean the maze with higher-level always have a higher rank, no matter how long the time cost is.

Of course you might try to map the paired value [T,L] to a real number [R] with some algorithm which follows the rules and then rank all players with [R] value.

However, if stable sorting is feasible, then you may simply sort the entire list by [T] (Faster players first) and then by [L]. In this case, the relative order of players (by time cost) will not be changed after you grouped them by level of maze they cleaned.

PS: of course the approach to sort twice is not the best solution to the particular problem but to explain the question of poster it should be enough.


It depends on what you do.

Imagine you've got some people records with a first and a last name field. First you sort the list by first name. If you then sort the list with a stable algorithm by last name, you'll have a list sorted by first name AND last name.


A stable sorting algorithm is the one that sorts the identical elements in their same order as they appear in the input, whilst unstable sorting may not satisfy the case. - I thank my algorithm lecturer Didem Gozupek to have provided insight into algorithms.

Stable Sorting Algorithms:

  • Insertion Sort
  • Merge Sort
  • Bubble Sort
  • Tim Sort
  • Counting Sort
  • Block Sort
  • Quadsort
  • Library Sort
  • Cocktail shaker Sort
  • Gnome Sort
  • Odd–even Sort

Unstable Sorting Algorithms:

  • Heap sort
  • Selection sort
  • Shell sort
  • Quick sort
  • Introsort (subject to Quicksort)
  • Tree sort
  • Cycle sort
  • Smoothsort
  • Tournament sort(subject to Hesapsort)

enter image description here


A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input unsorted array. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.

However, any given sorting algo which is not stable can be modified to be stable. There can be sorting algo specific ways to make it stable, but in general, any comparison based sorting algorithm which is not stable by nature can be modified to be stable by changing the key comparison operation so that the comparison of two keys considers position as a factor for objects with equal keys.

References: http://www.math.uic.edu/~leon/cs-mcs401-s08/handouts/stability.pdf http://en.wikipedia.org/wiki/Sorting_algorithm#Stability


There's a few reasons why stability can be important. One is that, if two records don't need to be swapped by swapping them you can cause a memory update, a page is marked dirty, and needs to be re-written to disk (or another slow medium).


Examples related to algorithm

How can I tell if an algorithm is efficient? Find the smallest positive integer that does not occur in a given sequence Efficiently getting all divisors of a given number Peak signal detection in realtime timeseries data What is the optimal algorithm for the game 2048? How can I sort a std::map first by value, then by key? Finding square root without using sqrt function? Fastest way to flatten / un-flatten nested JSON objects Mergesort with Python Find common substring between two strings

Examples related to sorting

Sort Array of object by object field in Angular 6 Sorting a list with stream.sorted() in Java How to sort dates from Oldest to Newest in Excel? how to sort pandas dataframe from one column Reverse a comparator in Java 8 Find the unique values in a column and then sort them pandas groupby sort within groups pandas groupby sort descending order Efficiently sorting a numpy array in descending order? Swift: Sort array of objects alphabetically

Examples related to language-agnostic

IOException: The process cannot access the file 'file path' because it is being used by another process Peak signal detection in realtime timeseries data Match linebreaks - \n or \r\n? Simple way to understand Encapsulation and Abstraction How can I pair socks from a pile efficiently? How do I determine whether my calculation of pi is accurate? What is ADT? (Abstract Data Type) How to explain callbacks in plain english? How are they different from calling one function from another function? Ukkonen's suffix tree algorithm in plain English Private vs Protected - Visibility Good-Practice Concern

Examples related to stability

What is stability in sorting algorithms and why is it important?