[machine-learning] What is the role of the bias in neural networks?

A layer in a neural network without a bias is nothing more than the multiplication of an input vector with a matrix. (The output vector might be passed through a sigmoid function for normalisation and for use in multi-layered ANN afterwards but that’s not important.)

This means that you’re using a linear function and thus an input of all zeros will always be mapped to an output of all zeros. This might be a reasonable solution for some systems but in general it is too restrictive.

Using a bias, you’re effectively adding another dimension to your input space, which always takes the value one, so you’re avoiding an input vector of all zeros. You don’t lose any generality by this because your trained weight matrix needs not be surjective, so it still can map to all values previously possible.

2d ANN:

For a ANN mapping two dimensions to one dimension, as in reproducing the AND or the OR (or XOR) functions, you can think of a neuronal network as doing the following:

On the 2d plane mark all positions of input vectors. So, for boolean values, you’d want to mark (-1,-1), (1,1), (-1,1), (1,-1). What your ANN now does is drawing a straight line on the 2d plane, separating the positive output from the negative output values.

Without bias, this straight line has to go through zero, whereas with bias, you’re free to put it anywhere. So, you’ll see that without bias you’re facing a problem with the AND function, since you can’t put both (1,-1) and (-1,1) to the negative side. (They are not allowed to be on the line.) The problem is equal for the OR function. With a bias, however, it’s easy to draw the line.

Note that the XOR function in that situation can’t be solved even with bias.

Examples related to machine-learning

Error in Python script "Expected 2D array, got 1D array instead:"? How to predict input image using trained model in Keras? What is the role of "Flatten" in Keras? How to concatenate two layers in keras? How to save final model using keras? scikit-learn random state in splitting dataset Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? Can anyone explain me StandardScaler? Can Keras with Tensorflow backend be forced to use CPU or GPU at will?

Examples related to neural-network

How to initialize weights in PyTorch? Keras input explanation: input_shape, units, batch_size, dim, etc What is the role of "Flatten" in Keras? How to concatenate two layers in keras? Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? How to return history of validation loss in Keras Keras model.summary() result - Understanding the # of Parameters Where do I call the BatchNormalization function in Keras? How to interpret "loss" and "accuracy" for a machine learning model

Examples related to artificial-intelligence

How to get Tensorflow tensor dimensions (shape) as int values? How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? What is the optimal algorithm for the game 2048? Epoch vs Iteration when training neural networks What's is the difference between train, validation and test set, in neural networks? What is the role of the bias in neural networks? What is the difference between supervised learning and unsupervised learning? What are good examples of genetic algorithms/genetic programming solutions? source of historical stock data What algorithm for a tic-tac-toe game can I use to determine the "best move" for the AI?

Examples related to backpropagation

What is the role of the bias in neural networks?