[python] Fast check for NaN in NumPy

I'm looking for the fastest way to check for the occurrence of NaN (np.nan) in a NumPy array X. np.isnan(X) is out of the question, since it builds a boolean array of shape X.shape, which is potentially gigantic.

I tried np.nan in X, but that seems not to work because np.nan != np.nan. Is there a fast and memory-efficient way to do this at all?

(To those who would ask "how gigantic": I can't tell. This is input validation for library code.)

This question is related to python performance numpy nan

The answer is


Even there exist an accepted answer, I'll like to demonstrate the following (with Python 2.7.2 and Numpy 1.6.0 on Vista):

In []: x= rand(1e5)
In []: %timeit isnan(x.min())
10000 loops, best of 3: 200 us per loop
In []: %timeit isnan(x.sum())
10000 loops, best of 3: 169 us per loop
In []: %timeit isnan(dot(x, x))
10000 loops, best of 3: 134 us per loop

In []: x[5e4]= NaN
In []: %timeit isnan(x.min())
100 loops, best of 3: 4.47 ms per loop
In []: %timeit isnan(x.sum())
100 loops, best of 3: 6.44 ms per loop
In []: %timeit isnan(dot(x, x))
10000 loops, best of 3: 138 us per loop

Thus, the really efficient way might be heavily dependent on the operating system. Anyway dot(.) based seems to be the most stable one.


I think np.isnan(np.min(X)) should do what you want.


There are two general approaches here:

  • Check each array item for nan and take any.
  • Apply some cumulative operation that preserves nans (like sum) and check its result.

While the first approach is certainly the cleanest, the heavy optimization of some of the cumulative operations (particularly the ones that are executed in BLAS, like dot) can make those quite fast. Note that dot, like some other BLAS operations, are multithreaded under certain conditions. This explains the difference in speed between different machines.

enter image description here

import numpy
import perfplot


def min(a):
    return numpy.isnan(numpy.min(a))


def sum(a):
    return numpy.isnan(numpy.sum(a))


def dot(a):
    return numpy.isnan(numpy.dot(a, a))


def any(a):
    return numpy.any(numpy.isnan(a))


def einsum(a):
    return numpy.isnan(numpy.einsum("i->", a))


perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[min, sum, dot, any, einsum],
    n_range=[2 ** k for k in range(20)],
    logx=True,
    logy=True,
    xlabel="len(a)",
)

If you're comfortable with it allows to create a fast short-circuit (stops as soon as a NaN is found) function:

import numba as nb
import math

@nb.njit
def anynan(array):
    array = array.ravel()
    for i in range(array.size):
        if math.isnan(array[i]):
            return True
    return False

If there is no NaN the function might actually be slower than np.min, I think that's because np.min uses multiprocessing for large arrays:

import numpy as np
array = np.random.random(2000000)

%timeit anynan(array)          # 100 loops, best of 3: 2.21 ms per loop
%timeit np.isnan(array.sum())  # 100 loops, best of 3: 4.45 ms per loop
%timeit np.isnan(array.min())  # 1000 loops, best of 3: 1.64 ms per loop

But in case there is a NaN in the array, especially if it's position is at low indices, then it's much faster:

array = np.random.random(2000000)
array[100] = np.nan

%timeit anynan(array)          # 1000000 loops, best of 3: 1.93 µs per loop
%timeit np.isnan(array.sum())  # 100 loops, best of 3: 4.57 ms per loop
%timeit np.isnan(array.min())  # 1000 loops, best of 3: 1.65 ms per loop

Similar results may be achieved with Cython or a C extension, these are a bit more complicated (or easily avaiable as bottleneck.anynan) but ultimatly do the same as my anynan function.


Related to this is the question of how to find the first occurrence of NaN. This is the fastest way to handle that that I know of:

index = next((i for (i,n) in enumerate(iterable) if n!=n), None)

  1. use .any()

    if numpy.isnan(myarray).any()

  2. numpy.isfinite maybe better than isnan for checking

    if not np.isfinite(prop).all()


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index

Examples related to nan

Display rows with one or more NaN values in pandas dataframe How to find which columns contain any NaN value in Pandas dataframe How to set a cell to NaN in a pandas dataframe Elegant way to create empty pandas DataFrame with NaN of type float How to check if any value is NaN in a Pandas DataFrame How to replace NaNs by preceding values in pandas DataFrame? Pandas Replace NaN with blank/empty string Convert pandas.Series from dtype object to float, and errors to nans How to filter in NaN (pandas)? Replace None with NaN in pandas dataframe