In my case I got NAN when setting distant integer LABELs. ie:
So, not use a very distant Label.
EDIT You can see the effect in the following simple code:
from keras.models import Sequential
from keras.layers import Dense, Activation
import numpy as np
X=np.random.random(size=(20,5))
y=np.random.randint(0,high=5, size=(20,1))
model = Sequential([
Dense(10, input_dim=X.shape[1]),
Activation('relu'),
Dense(5),
Activation('softmax')
])
model.compile(optimizer = "Adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"] )
print('fit model with labels in range 0..5')
history = model.fit(X, y, epochs= 5 )
X = np.vstack( (X, np.random.random(size=(1,5))))
y = np.vstack( ( y, [[8000]]))
print('fit model with labels in range 0..5 plus 8000')
history = model.fit(X, y, epochs= 5 )
The result shows the NANs after adding the label 8000:
fit model with labels in range 0..5
Epoch 1/5
20/20 [==============================] - 0s 25ms/step - loss: 1.8345 - acc: 0.1500
Epoch 2/5
20/20 [==============================] - 0s 150us/step - loss: 1.8312 - acc: 0.1500
Epoch 3/5
20/20 [==============================] - 0s 151us/step - loss: 1.8273 - acc: 0.1500
Epoch 4/5
20/20 [==============================] - 0s 198us/step - loss: 1.8233 - acc: 0.1500
Epoch 5/5
20/20 [==============================] - 0s 151us/step - loss: 1.8192 - acc: 0.1500
fit model with labels in range 0..5 plus 8000
Epoch 1/5
21/21 [==============================] - 0s 142us/step - loss: nan - acc: 0.1429
Epoch 2/5
21/21 [==============================] - 0s 238us/step - loss: nan - acc: 0.2381
Epoch 3/5
21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381
Epoch 4/5
21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381
Epoch 5/5
21/21 [==============================] - 0s 188us/step - loss: nan - acc: 0.2381