Counting the number of non-NaN elements in a numpy ndarray in Python

Question

I need to calculate the number of non-NaN elements in a numpy ndarray matrix  How would one efficiently do this in Python  Here is my simple code for achieving this    import numpy as np  def numberOfNonNans data       count   0     for i in data          if not np isnan i               count    1     return count    Is there a built-in function for this in numpy  Efficiency is important because I m doing Big Data analysis    Thnx for any help

User · Answer

Quick-to-write alterantive

Even though is not the fastest choice, if performance is not an issue you can use:

sum(~np.isnan(data)).

Performance:

In [7]: %timeit data.size - np.count_nonzero(np.isnan(data))
10 loops, best of 3: 67.5 ms per loop

In [8]: %timeit sum(~np.isnan(data))
10 loops, best of 3: 154 ms per loop

In [9]: %timeit np.sum(~np.isnan(data))
10 loops, best of 3: 140 ms per loop

User · Answer

np count nonzero  np isnan data       inverts the boolean matrix returned from np isnan   np count nonzero counts values that is not 0 false   sum should give the same result  But maybe more clearly to use count nonzero  Testing speed    In  23   data   np random random  10000 10000    In  24   data  np random random integers 0 10000  100          np random random integers 0 99  100      np nan  In  25    timeit data size - np count nonzero np isnan data   1 loops  best of 3  309 ms per loop  In  26    timeit np count nonzero  np isnan data   1 loops  best of 3  345 ms per loop  In  27    timeit data size - np isnan data  sum   1 loops  best of 3  339 ms per loop   data size - np count nonzero np isnan data   seems to barely be the fastest here  other data might give different relative speed results

User · Answer

To determine if the array is sparse  it may help to get a proportion of nan values  np isnan ndarr  sum     ndarr size   If that proportion exceeds a threshold  then use a sparse array  e g  - https   sparse pydata org en latest

User · Answer

An alternative  but a bit slower alternative is to do it over indexing   np isnan data  np isnan data     False  size  In  30    timeit np isnan data  np isnan data     False  size 1 loops  best of 3  498 ms per loop    The double use of np isnan data  and the    operator might be a bit overkill and so I posted the answer only for completeness

[python] Counting the number of non-NaN elements in a numpy ndarray in Python

Quick-to-write alterantive

Performance:

Examples related to python

Examples related to numpy

Examples related to matrix

Examples related to nan