If you follow the principle of Occam's razor, you might think setting all the weights to 0 or 1 would be the best solution. This is not the case.
With every weight the same, all the neurons at each layer are producing the same output. This makes it hard to decide which weights to adjust.
# initialize two NN's with 0 and 1 constant weights
model_0 = Net(constant_weight=0)
model_1 = Net(constant_weight=1)
Validation Accuracy
9.625% -- All Zeros
10.050% -- All Ones
Training Loss
2.304 -- All Zeros
1552.281 -- All Ones
A uniform distribution has the equal probability of picking any number from a set of numbers.
Let's see how well the neural network trains using a uniform weight initialization, where low=0.0
and high=1.0
.
Below, we'll see another way (besides in the Net class code) to initialize the weights of a network. To define weights outside of the model definition, we can:
- Define a function that assigns weights by the type of network layer, then
- Apply those weights to an initialized model using
model.apply(fn)
, which applies a function to each model layer.
# takes in a module and applies the specified weight initialization
def weights_init_uniform(m):
classname = m.__class__.__name__
# for every Linear layer in a model..
if classname.find('Linear') != -1:
# apply a uniform distribution to the weights and a bias=0
m.weight.data.uniform_(0.0, 1.0)
m.bias.data.fill_(0)
model_uniform = Net()
model_uniform.apply(weights_init_uniform)
Validation Accuracy
36.667% -- Uniform Weights
Training Loss
3.208 -- Uniform Weights
The general rule for setting the weights in a neural network is to set them to be close to zero without being too small.
Good practice is to start your weights in the range of [-y, y] where
y=1/sqrt(n)
(n is the number of inputs to a given neuron).
# takes in a module and applies the specified weight initialization
def weights_init_uniform_rule(m):
classname = m.__class__.__name__
# for every Linear layer in a model..
if classname.find('Linear') != -1:
# get the number of the inputs
n = m.in_features
y = 1.0/np.sqrt(n)
m.weight.data.uniform_(-y, y)
m.bias.data.fill_(0)
# create a new model with these weights
model_rule = Net()
model_rule.apply(weights_init_uniform_rule)
below we compare performance of NN, weights initialized with uniform distribution [-0.5,0.5) versus the one whose weight is initialized using general rule
Validation Accuracy
75.817% -- Centered Weights [-0.5, 0.5)
85.208% -- General Rule [-y, y)
Training Loss
0.705 -- Centered Weights [-0.5, 0.5)
0.469 -- General Rule [-y, y)
The normal distribution should have a mean of 0 and a standard deviation of
y=1/sqrt(n)
, where n is the number of inputs to NN
## takes in a module and applies the specified weight initialization
def weights_init_normal(m):
'''Takes in a module and initializes all linear layers with weight
values taken from a normal distribution.'''
classname = m.__class__.__name__
# for every Linear layer in a model
if classname.find('Linear') != -1:
y = m.in_features
# m.weight.data shoud be taken from a normal distribution
m.weight.data.normal_(0.0,1/np.sqrt(y))
# m.bias.data should be 0
m.bias.data.fill_(0)
below we show the performance of two NN one initialized using uniform-distribution and the other using normal-distribution
Validation Accuracy
85.775% -- Uniform Rule [-y, y)
84.717% -- Normal Distribution
Training Loss
0.329 -- Uniform Rule [-y, y)
0.443 -- Normal Distribution