sklearn error ValueError Input contains NaN infinity or a value too large for dtype float64

Question

I am using sklearn and having a problem with the affinity propagation  I have built an input matrix and I keep getting the following error    ValueError  Input contains NaN  infinity or a value too large for dtype  float64      I have run  np isnan mat any     and gets False np isfinite mat all     and gets True   I tried using  mat np isfinite mat     True    0   to remove the infinite values but this did not work either   What can I do to get rid of the infinite values in my matrix  so that I can use the affinity propagation algorithm   I am using anaconda and python 2 7 9

User · Answer

try   mat sum     If the sum of your data is infinity  greater that the max float value which is 3 402823e 38  you will get that error   see the  assert all finite function in validation py from the scikit source code   if is float and np isfinite X sum         pass elif is float      msg err    Input contains    or a value too large for   r        if  allow nan and np isinf X  any   or             not allow nan and not np isfinite X  all             type err    infinity  if allow nan else  NaN  infinity            print X sum            raise ValueError msg err format type err  X dtype

User · Answer

The Dimensions of my input array were skewed  as my input csv had empty spaces

User · Answer

Remove all infinite values   and replace with min or max for that column  import numpy as np    generate example matrix matrix   np random rand 5 5  matrix 0      np inf matrix 2      -np inf  gt  gt  gt  matrix array          inf         inf         inf         inf         inf           0 87362809  0 28321499  0 7427659   0 37570528  0 35783064                 -inf        -inf        -inf        -inf        -inf           0 72877665  0 06580068  0 95222639  0 00833664  0 68779902           0 90272002  0 37357483  0 92952479  0 072105    0 20837798       find min and max values for each column  ignoring nan  -inf  and inf mins    np nanmin matrix    i  matrix    i     -np inf   for i in range matrix shape 1    maxs    np nanmax matrix    i  matrix    i     np inf   for i in range matrix shape 1       go through matrix one column at a time and replace    and -infinity    with the max or min for that column for i in range matrix shape 1        matrix    i  matrix    i     -np inf    mins i      matrix    i  matrix    i     np inf    maxs i    gt  gt  gt  matrix array   0 90272002  0 37357483  0 95222639  0 37570528  0 68779902           0 87362809  0 28321499  0 7427659   0 37570528  0 35783064           0 72877665  0 06580068  0 7427659   0 00833664  0 20837798           0 72877665  0 06580068  0 95222639  0 00833664  0 68779902           0 90272002  0 37357483  0 92952479  0 072105    0 20837798

User · Answer

This might happen inside scikit  and it depends on what you re doing  I recommend reading the documentation for the functions you re using  You might be using one which depends e g  on your matrix being positive definite and not fulfilling that criteria   EDIT  How could I miss that   np isnan mat any     and gets False np isfinite mat all     and gets True   is obviously wrong  Right would be   np any np isnan mat     and  np all np isfinite mat     You want to check wheter any of the element is NaN  and not whether the return value of the any function is a number

User · Answer

None of the answers here worked for me  This was what worked  Test y   np nan to num Test y   It replaces the infinity values with high finite values and the nan values with numbers

User · Answer

I had the error after trying to select a subset of rows   df   df reindex index my index    Turns out that my index contained values that were not contained in df index  so the reindex function inserted some new rows and filled them with nan

User · Answer

i got the same error  it worked with df fillna -99999  inplace True  before doing any replacement  substitution etc

User · Answer

With this version of python 3    opt anaconda3 bin python --version Python 3 6 0    Anaconda 4 3 0  64-bit    Looking at the details of the error  I found the lines of codes causing the failure    opt anaconda3 lib python3 6 site-packages sklearn utils validation py in  assert all finite X       56             and not np isfinite X  all          57         raise ValueError  Input contains NaN  infinity  --- gt  58                            or a value too large for  r     X dtype       59       60   ValueError  Input contains NaN  infinity or a value too large for dtype  float64      From this  I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message  np isfinite X   Then with a quick and dirty loop  I was able to find that my data indeed contains nans   print p   0  shape  index   0 for i in p   0       if not np isfinite i           print index  i      index   1   367340   4454 nan 6940 nan 10868 nan 12753 nan 14855 nan 15678 nan 24954 nan 30251 nan 31108 nan 51455 nan 59055 nan       Now all I have to do is remove the values at these indexes

User · Answer

dataset   dataset dropna axis 0  how  any   thresh None  subset None  inplace False   This worked for me

User · Answer

This is my function  based on this  to clean the dataset of nan  Inf  and missing cells  for skewed datasets    import pandas as pd  def clean dataset df       assert isinstance df  pd DataFrame    df needs to be a pd DataFrame      df dropna inplace True      indices to keep    df isin  np nan  np inf  -np inf   any 1      return df indices to keep  astype np float64

User · Answer

This is the check on which it fails    https   github com scikit-learn scikit-learn blob 0 17 X sklearn utils validation py L51   Which says  def  assert all finite X          Like assert all finite  but only for ndarray         X   np asanyarray X        First try an O n  time  O 1  space solution for the common case that       everything is finite  fall back to O n  space np isfinite to prevent       false positives from overflow in sum method      if  X dtype char in np typecodes  AllFloat   and not np isfinite X sum                and not np isfinite X  all             raise ValueError  Input contains NaN  infinity                             or a value too large for  r     X dtype    So make sure that you have non NaN values in your input  And all those values are actually float values  None of the values should be Inf either

User · Answer

In my case the problem was that many scikit functions return numpy arrays  which are devoid of pandas index  So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data

User · Answer

I had the same error  and in my case X and y were dataframes so I had to convert them to matrices first  X   X values astype np float  y   y values astype np float   Edit  The originally suggested X as matrix   is Deprecated

User · Answer

In most cases getting rid of infinite and null values solve this problem    get rid of infinite values   df replace  np inf  -np inf   np nan  inplace True    get rid of null values the way you like  specific value such as 999  mean  or create your own function to impute missing values   df fillna 999  inplace True

User · Answer

I would like to propose a solution for numpy that worked well for me  The line from numpy import inf inputArray inputArray    inf    np finfo np float64  max  substitues all infite values of a numpy array with the maximum float64 number

User · Answer

I got the same error message when using sklearn with pandas  My solution is to reset the index of my dataframe df before running any sklearn code   df   df reset index     I encountered this issue many times when I removed some entries in my df  such as  df   df df label   desired one

[python] sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

Examples related to python

Examples related to python-2.7

Examples related to scikit-learn

Examples related to valueerror