np.percentile
DOES NOT calculate the values of Q1, median, and Q3. Consider the sorted list below:
samples = [1, 1, 8, 12, 13, 13, 14, 16, 19, 22, 27, 28, 31]
running np.percentile(samples, [25, 50, 75])
returns the actual values from the list:
Out[1]: array([12., 14., 22.])
However, the quartiles are Q1=10.0, Median=14, Q3=24.5
(you can also use this link to find the quartiles and median online).
One can use the below code to calculate the quartiles and median of a sorted list (because of sorting this approach requires O(nlogn)
computations where n
is the number of items).
Moreover, finding quartiles and median can be done in O(n)
computations using the Median of medians Selection algorithm (order statistics).
samples = sorted([28, 12, 8, 27, 16, 31, 14, 13, 19, 1, 1, 22, 13])
def find_median(sorted_list):
indices = []
list_size = len(sorted_list)
median = 0
if list_size % 2 == 0:
indices.append(int(list_size / 2) - 1) # -1 because index starts from 0
indices.append(int(list_size / 2))
median = (sorted_list[indices[0]] + sorted_list[indices[1]]) / 2
pass
else:
indices.append(int(list_size / 2))
median = sorted_list[indices[0]]
pass
return median, indices
pass
median, median_indices = find_median(samples)
Q1, Q1_indices = find_median(samples[:median_indices[0]])
Q2, Q2_indices = find_median(samples[median_indices[-1] + 1:])
quartiles = [Q1, median, Q2]
print("(Q1, median, Q3): {}".format(quartiles))