[machine-learning] Why do we have to normalize the input for an artificial neural network?

On a high level, if you observe as to where normalization/standardization is mostly used, you will notice that, anytime there is a use of magnitude difference in model building process, it becomes necessary to standardize the inputs so as to ensure that important inputs with small magnitude don't loose their significance midway the model building process.

example:

v(3-1)^2+(1000-900)^2 ˜ v(1000-900)^2

Here, (3-1) contributes hardly a thing to the result and hence the input corresponding to these values is considered futile by the model.

Consider the following:

  1. Clustering uses euclidean or, other distance measures.
  2. NNs use optimization algorithm to minimise cost function(ex. - MSE).

Both distance measure(Clustering) and cost function(NNs) use magnitude difference in some way and hence standardization ensures that magnitude difference doesn't command over important input parameters and the algorithm works as expected.

Examples related to machine-learning

Error in Python script "Expected 2D array, got 1D array instead:"? How to predict input image using trained model in Keras? What is the role of "Flatten" in Keras? How to concatenate two layers in keras? How to save final model using keras? scikit-learn random state in splitting dataset Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? Can anyone explain me StandardScaler? Can Keras with Tensorflow backend be forced to use CPU or GPU at will?

Examples related to neural-network

How to initialize weights in PyTorch? Keras input explanation: input_shape, units, batch_size, dim, etc What is the role of "Flatten" in Keras? How to concatenate two layers in keras? Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? How to return history of validation loss in Keras Keras model.summary() result - Understanding the # of Parameters Where do I call the BatchNormalization function in Keras? How to interpret "loss" and "accuracy" for a machine learning model

Examples related to normalization

How to normalize an array in NumPy to a unit vector? Standardize data columns in R How to normalize a 2-dimensional numpy array in python less verbose? Can I calculate z-score with R? How to normalize a histogram in MATLAB? Why do we have to normalize the input for an artificial neural network?