[python] Keras model.summary() result - Understanding the # of Parameters

I have a simple NN model for detecting hand-written digits from a 28x28px image written in python using Keras (Theano backend):

model0 = Sequential()

#number of epochs to train for
nb_epoch = 12
#amount of data each iteration in an epoch sees
batch_size = 128

model0.add(Flatten(input_shape=(1, img_rows, img_cols)))
model0.add(Dense(nb_classes))
model0.add(Activation('softmax'))
model0.compile(loss='categorical_crossentropy', 
         optimizer='sgd',
         metrics=['accuracy'])

model0.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
      verbose=1, validation_data=(X_test, Y_test))

score = model0.evaluate(X_test, Y_test, verbose=0)

print('Test score:', score[0])
print('Test accuracy:', score[1])

This runs well and I get ~90% accuracy. I then perform the following command to get a summary of my network's structure by doing print(model0.summary()). This outputs the following:

Layer (type)         Output Shape   Param #     Connected to                     
=====================================================================
flatten_1 (Flatten)   (None, 784)     0           flatten_input_1[0][0]            
dense_1 (Dense)     (None, 10)       7850        flatten_1[0][0]                  
activation_1        (None, 10)          0           dense_1[0][0]                    
======================================================================
Total params: 7850

I don't understand how they get to 7850 total params and what that actually means?

This question is related to python machine-learning neural-network keras theano

The answer is


Number of parameters is the amount of numbers that can be changed in the model. Mathematically this means number of dimensions of your optimization problem. For you as a programmer, each of this parameters is a floating point number, which typically takes 4 bytes of memory, allowing you to predict the size of this model once saved.

This formula for this number is different for each neural network layer type, but for Dense layer it is simple: each neuron has one bias parameter and one weight per input: N = n_neurons * ( n_inputs + 1).


The number of parameters is 7850 because with every hidden unit you have 784 input weights and one weight of connection with bias. This means that every hidden unit gives you 785 parameters. You have 10 units so it sums up to 7850.

The role of this additional bias term is really important. It significantly increases the capacity of your model. You can read details e.g. here Role of Bias in Neural Networks.


I feed a 514 dimensional real-valued input to a Sequential model in Keras. My model is constructed in following way :

    predictivemodel = Sequential()
    predictivemodel.add(Dense(514, input_dim=514, W_regularizer=WeightRegularizer(l1=0.000001,l2=0.000001), init='normal'))
    predictivemodel.add(Dense(257, W_regularizer=WeightRegularizer(l1=0.000001,l2=0.000001), init='normal'))
    predictivemodel.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

When I print model.summary() I get following result:

Layer (type)    Output Shape  Param #     Connected to                   
================================================================
dense_1 (Dense) (None, 514)   264710      dense_input_1[0][0]              
________________________________________________________________
activation_1    (None, 514)   0           dense_1[0][0]                    
________________________________________________________________
dense_2 (Dense) (None, 257)   132355      activation_1[0][0]               
================================================================
Total params: 397065
________________________________________________________________ 

For the dense_1 layer , number of params is 264710. This is obtained as : 514 (input values) * 514 (neurons in the first layer) + 514 (bias values)

For dense_2 layer, number of params is 132355. This is obtained as : 514 (input values) * 257 (neurons in the second layer) + 257 (bias values for neurons in the second layer)


For Dense Layers:

output_size * (input_size + 1) == number_parameters 

For Conv Layers:

output_channels * (input_channels * window_size + 1) == number_parameters

Consider following example,

model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
Conv2D(64, (3, 3), activation='relu'),
Conv2D(128, (3, 3), activation='relu'),
Dense(num_classes, activation='softmax')
])

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 222, 222, 32)      896       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 220, 220, 64)      18496     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 218, 218, 128)     73856     
_________________________________________________________________
dense_9 (Dense)              (None, 218, 218, 10)      1290      
=================================================================

Calculating params,

assert 32 * (3 * (3*3) + 1) == 896
assert 64 * (32 * (3*3) + 1) == 18496
assert 128 * (64 * (3*3) + 1) == 73856
assert num_classes * (128 + 1) == 1290

The "none" in the shape means it does not have a pre-defined number. For example, it can be the batch size you use during training, and you want to make it flexible by not assigning any value to it so that you can change your batch size. The model will infer the shape from the context of the layers.

To get nodes connected to each layer, you can do the following:

for layer in model.layers:
    print(layer.name, layer.inbound_nodes, layer.outbound_nodes)

The easiest way to calculate number of neurons in one layer is: Param value / (number of units * 4)

  • Number of units is in predictivemodel.add(Dense(514,...)
  • Param value is Param in model.summary() function

For example in Paul Lo's answer , number of neurons in one layer is 264710 / (514 * 4 ) = 130


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to machine-learning

Error in Python script "Expected 2D array, got 1D array instead:"? How to predict input image using trained model in Keras? What is the role of "Flatten" in Keras? How to concatenate two layers in keras? How to save final model using keras? scikit-learn random state in splitting dataset Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? Can anyone explain me StandardScaler? Can Keras with Tensorflow backend be forced to use CPU or GPU at will?

Examples related to neural-network

How to initialize weights in PyTorch? Keras input explanation: input_shape, units, batch_size, dim, etc What is the role of "Flatten" in Keras? How to concatenate two layers in keras? Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? How to return history of validation loss in Keras Keras model.summary() result - Understanding the # of Parameters Where do I call the BatchNormalization function in Keras? How to interpret "loss" and "accuracy" for a machine learning model

Examples related to keras

Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session' What is the use of verbose in Keras while validating the model? Save and load weights in keras How to import keras from tf.keras in Tensorflow? How to check which version of Keras is installed? Can I run Keras model on gpu? How to check if keras tensorflow backend is GPU or CPU version? Keras input explanation: input_shape, units, batch_size, dim, etc

Examples related to theano

Deep-Learning Nan loss reasons Keras, how do I predict after I trained a model? Keras model.summary() result - Understanding the # of Parameters How do I install Keras and Theano in Anaconda Python on Windows?