# Is there a numpy builtin to reject outliers from a list

111

Is there a numpy builtin to do something like the following? That is, take a list `d` and return a list `filtered_d` with any outlying elements removed based on some assumed distribution of the points in `d`.

``````import numpy as np

def reject_outliers(data):
m = 2
u = np.mean(data)
s = np.std(data)
filtered = [e for e in data if (u - 2 * s < e < u + 2 * s)]
return filtered

>>> d = [2,4,5,1,6,5,40]
>>> filtered_d = reject_outliers(d)
>>> print filtered_d
[2,4,5,1,6,5]
``````

I say 'something like' because the function might allow for varying distributions (poisson, gaussian, etc.) and varying outlier thresholds within those distributions (like the `m` I've used here).

This question is tagged with `python` `numpy`

~ Asked on 2012-07-27 11:19:17

### The Best Answer is

117

This method is almost identical to yours, just more numpyst (also working on numpy arrays only):

``````def reject_outliers(data, m=2):
return data[abs(data - np.mean(data)) < m * np.std(data)]
``````

~ Answered on 2012-07-27 11:22:30

194

Something important when dealing with outliers is that one should try to use estimators as robust as possible. The mean of a distribution will be biased by outliers but e.g. the median will be much less.

Building on eumiro's answer:

``````def reject_outliers(data, m = 2.):
d = np.abs(data - np.median(data))
mdev = np.median(d)
s = d/mdev if mdev else 0.
return data[s<m]
``````

Here I have replace the mean with the more robust median and the standard deviation with the median absolute distance to the median. I then scaled the distances by their (again) median value so that `m` is on a reasonable relative scale.

Note that for the `data[s<m]` syntax to work, `data` must be a numpy array.

~ Answered on 2013-05-15 09:58:26