[python] Finding median of list in Python

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.

If the list contains an even number of elements, the function should return the average of the middle two.

Here are some examples (sorted for display purposes):

median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2

This question is related to python list sorting median

The answer is


Of course you can use build in functions, but if you would like to create your own you can do something like this. The trick here is to use ~ operator that flip positive number to negative. For instance ~2 -> -3 and using negative in for list in Python will count items from the end. So if you have mid == 2 then it will take third element from beginning and third item from the end.

def median(data):
    data.sort()
    mid = len(data) // 2
    return (data[mid] + data[~mid]) / 2

Here's the tedious way to find median without using the median function:

def median(*arg):
    order(arg)
    numArg = len(arg)
    half = int(numArg/2)
    if numArg/2 ==half:
        print((arg[half-1]+arg[half])/2)
    else:
        print(int(arg[half]))

def order(tup):
    ordered = [tup[i] for i in range(len(tup))]
    test(ordered)
    while(test(ordered)):
        test(ordered)
    print(ordered)


def test(ordered):
    whileloop = 0 
    for i in range(len(ordered)-1):
        print(i)
        if (ordered[i]>ordered[i+1]):
            print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
            original = ordered[i+1]
            ordered[i+1]=ordered[i]
            ordered[i]=original
            whileloop = 1 #run the loop again if you had to switch values
    return whileloop

It is very simple;

def median(alist):
    #to find median you will have to sort the list first
    sList = sorted(alist)
    first = 0
    last = len(sList)-1
    midpoint = (first + last)//2
    return midpoint

And you can use the return value like this median = median(anyList)


def median(x):
    x = sorted(x)
    listlength = len(x) 
    num = listlength//2
    if listlength%2==0:
        middlenum = (x[num]+x[num-1])/2
    else:
        middlenum = x[num]
    return middlenum

Here's a cleaner solution:

def median(lst):
    quotient, remainder = divmod(len(lst), 2)
    if remainder:
        return sorted(lst)[quotient]
    return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.

Note: Answer changed to incorporate suggestion in comments.


def median(array):
    """Calculate median of the given list.
    """
    # TODO: use statistics.median in Python 3
    array = sorted(array)
    half, odd = divmod(len(array), 2)
    if odd:
        return array[half]
    return (array[half - 1] + array[half]) / 2.0

def median(array):
    if len(array) < 1:
        return(None)
    if len(array) % 2 == 0:
        median = (array[len(array)//2-1: len(array)//2+1])
        return sum(median) / len(median)
    else:
        return(array[len(array)//2])

What I did was this:

def median(a):
    a.sort()
    if len(a) / 2 != int:
        return a[len(a) / 2]
    else:
        return (a[len(a) / 2] + a[(len(a) / 2) - 1]) / 2

Explanation: Basically if the number of items in the list is odd, return the middle number, otherwise, if you half an even list, python automatically rounds the higher number so we know the number before that will be one less (since we sorted it) and we can add the default higher number and the number lower than it and divide them by 2 to find the median.


import numpy as np
def get_median(xs):
        mid = len(xs) // 2  # Take the mid of the list
        if len(xs) % 2 == 1: # check if the len of list is odd
            return sorted(xs)[mid] #if true then mid will be median after sorting
        else:
            #return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
            return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))

Here what I came up with during this exercise in Codecademy:

def median(data):
    new_list = sorted(data)
    if len(new_list)%2 > 0:
        return new_list[len(new_list)/2]
    elif len(new_list)%2 == 0:
        return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0

print median([1,2,3,4,5,9])

A more generalized approach for median (and percentiles) would be:

def get_percentile(data, percentile):
    # Get the number of observations
    cnt=len(data)
    # Sort the list
    data=sorted(data)
    # Determine the split point
    i=(cnt-1)*percentile
    # Find the `floor` of the split point
    diff=i-int(i)
    # Return the weighted average of the value above and below the split point
    return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)

# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4

# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04


median Function

def median(midlist):
    midlist.sort()
    lens = len(midlist)
    if lens % 2 != 0: 
        midl = (lens / 2)
        res = midlist[midl]
    else:
        odd = (lens / 2) -1
        ev = (lens / 2) 
        res = float(midlist[odd] + midlist[ev]) / float(2)
    return res

fuction median:

def median(d):
    d=np.sort(d)
    n2=int(len(d)/2)
    r=n2%2
    if (r==0):
        med=d[n2] 
    else:
        med=(d[n2] + data[m+1]) / 2
    return med

In case you need additional information on the distribution of your list, the percentile method will probably be useful. And a median value corresponds to the 50th percentile of a list:

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value 

A simple function to return the median of the given list:

def median(lst):
    lst.sort()  # Sort the list first
    if len(lst) % 2 == 0:  # Checking if the length is even
        # Applying formula which is sum of middle two divided by 2
        return (lst[len(lst) // 2] + lst[(len(lst) - 1) // 2]) / 2
    else:
        # If length is odd then get middle value
        return lst[len(lst) // 2]

Some examples with the medain function:

>>> median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> median([9, 12, 80, 21, 34])  # Odd
21

If you want to use library you can just simply do:

>>> import statistics
>>> statistics.median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> statistics.median([9, 12, 80, 21, 34])  # Odd
21

You can use the list.sort to avoid creating new lists with sorted and sort the lists in place.

Also you should not use list as a variable name as it shadows python's own list.

def median(l):
    half = len(l) // 2
    l.sort()
    if not len(l) % 2:
        return (l[half - 1] + l[half]) / 2.0
    return l[half]

I had some problems with lists of float values. I ended up using a code snippet from the python3 statistics.median and is working perfect with float values without imports. source

def calculateMedian(list):
    data = sorted(list)
    n = len(data)
    if n == 0:
        return None
    if n % 2 == 1:
        return data[n // 2]
    else:
        i = n // 2
        return (data[i - 1] + data[i]) / 2

Just two lines are enough.

def get_median(arr):
    '''
    Calculate the median of a sequence.
    :param arr: list
    :return: int or float
    '''
    arr.sort()
    return arr[len(arr)//2] if len(arr) % 2 else (arr[len(arr)//2] + arr[len(arr)//2-1])/2

I defined a median function for a list of numbers as

def median(numbers):
    return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0

(Works with ):

def median(lst):
    n = len(lst)
    s = sorted(lst)
    return (sum(s[n//2-1:n//2+1])/2.0, s[n//2])[n % 2] if n else None

>>> median([-5, -5, -3, -4, 0, -1])
-3.5

numpy.median():

>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0

For , use statistics.median:

>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0

Do yourself.

def median(numbers):
    """
    Calculate median of a list numbers.
    :param numbers: the numbers to be calculated.
    :return: median value of numbers.

    >>> median([1, 3, 3, 6, 7, 8, 9])
    6
    >>> median([1, 2, 3, 4, 5, 6, 8, 9])
    4.5
    >>> import statistics
    >>> import random
    >>> numbers = random.sample(range(-50, 50), k=100)
    >>> statistics.median(numbers) == median(numbers)
    True
    """
    numbers = sorted(numbers)
    mid_index = len(numbers) // 2
    return (
        (numbers[mid_index] + numbers[mid_index - 1]) / 2 if mid_index % 2 == 0
        else numbers[mid_index]
    )


if __name__ == "__main__":
    from doctest import testmod

    testmod()

source from


def midme(list1):

    list1.sort()
    if len(list1)%2>0:
            x = list1[int((len(list1)/2))]
    else:
            x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
    return x


midme([4,5,1,7,2])

Python 3.4 has statistics.median:

Return the median (middle value) of numeric data.

When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values:

>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

Usage:

import statistics

items = [6, 1, 8, 2, 3]

statistics.median(items)
#>>> 3

It's pretty careful with types, too:

statistics.median(map(float, items))
#>>> 3.0

from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')

The sorted() function is very helpful for this. Use the sorted function to order the list, then simply return the middle value (or average the two middle values if the list contains an even amount of elements).

def median(lst):
    sortedLst = sorted(lst)
    lstLen = len(lst)
    index = (lstLen - 1) // 2
   
    if (lstLen % 2):
        return sortedLst[index]
    else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0

You can try the quickselect algorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n²) on a bad day.

Here's an implementation with a randomly chosen pivot:

import random

def select_nth(n, items):
    pivot = random.choice(items)

    lesser = [item for item in items if item < pivot]
    if len(lesser) > n:
        return select_nth(n, lesser)
    n -= len(lesser)

    numequal = items.count(pivot)
    if numequal > n:
        return pivot
    n -= numequal

    greater = [item for item in items if item > pivot]
    return select_nth(n, greater)

You can trivially turn this into a method to find medians:

def median(items):
    if len(items) % 2:
        return select_nth(len(items)//2, items)

    else:
        left  = select_nth((len(items)-1) // 2, items)
        right = select_nth((len(items)+1) // 2, items)

        return (left + right) / 2

This is very unoptimised, but it's not likely that even an optimised version will outperform Tim Sort (CPython's built-in sort) because that's really fast. I've tried before and I lost.


I posted my solution at Python implementation of "median of medians" algorithm , which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.

Per Tom's request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.

#!/bin/pypy
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random


items_per_column = 15


def find_i_th_smallest( A, i ):
    t = len(A)
    if(t <= items_per_column):
        # if A is a small list with less than items_per_column items, then:
        #
        # 1. do sort on A
        # 2. find i-th smallest item of A
        #
        return sorted(A)[i]
    else:
        # 1. partition A into columns of k items each. k is odd, say 5.
        # 2. find the median of every column
        # 3. put all medians in a new list, say, B
        #
        B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]

        # 4. find M, the median of B
        #
        M = find_i_th_smallest(B, (len(B) - 1)/2)


        # 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
        # 6. find which above set has A's i-th smallest, recursively.
        #
        P1 = [ j for j in A if j < M ]
        if(i < len(P1)):
            return find_i_th_smallest( P1, i)
        P3 = [ j for j in A if j > M ]
        L3 = len(P3)
        if(i < (t - L3)):
            return M
        return find_i_th_smallest( P3, i - (t - L3))


# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])


# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]


# Show the original list
#
# print L


# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]


# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to sorting

Sort Array of object by object field in Angular 6 Sorting a list with stream.sorted() in Java How to sort dates from Oldest to Newest in Excel? how to sort pandas dataframe from one column Reverse a comparator in Java 8 Find the unique values in a column and then sort them pandas groupby sort within groups pandas groupby sort descending order Efficiently sorting a numpy array in descending order? Swift: Sort array of objects alphabetically

Examples related to median

Finding median of list in Python Finding the median of an unsorted array Find running median from a stream of integers Help needed with Median If in Excel Calculate median in c# Function to Calculate Median in SQL Server Simple way to calculate median with MySQL