What is logits softmax and softmax cross entropy with logits

Question

I was going through the tensorflow API docs here  In the tensorflow documentation  they used a keyword called logits  What is it  In a lot of methods in the API docs it is written like  tf nn softmax logits  name None    If what is written is those logits are only Tensors  why keeping a different name like logits    Another thing is that there are two methods I could not differentiate  They were  tf nn softmax logits  name None  tf nn softmax cross entropy with logits logits  labels  name None    What are the differences between them  The docs are not clear to me  I know what tf nn softmax does  But not the other  An example will be really helpful

User · Accepted Answer

Logits simply means that the function operates on the unscaled output of earlier layers and that the relative scale to understand the units is linear. It means, in particular, the sum of the inputs may not equal 1, that the values are not probabilities (you might have an input of 5).

tf.nn.softmax produces just the result of applying the softmax function to an input tensor. The softmax "squishes" the inputs so that sum(input) = 1: it's a way of normalizing. The shape of output of a softmax is the same as the input: it just normalizes the values. The outputs of softmax can be interpreted as probabilities.

a = tf.constant(np.array([[.1, .3, .5, .9]]))
print s.run(tf.nn.softmax(a))
[[ 0.16838508  0.205666    0.25120102  0.37474789]]

In contrast, tf.nn.softmax_cross_entropy_with_logits computes the cross entropy of the result after applying the softmax function (but it does it all together in a more mathematically careful way). It's similar to the result of:

sm = tf.nn.softmax(x)
ce = cross_entropy(sm)

The cross entropy is a summary metric: it sums across the elements. The output of tf.nn.softmax_cross_entropy_with_logits on a shape [2,5] tensor is of shape [2,1] (the first dimension is treated as the batch).

If you want to do optimization to minimize the cross entropy AND you're softmaxing after your last layer, you should use tf.nn.softmax_cross_entropy_with_logits instead of doing it yourself, because it covers numerically unstable corner cases in the mathematically right way. Otherwise, you'll end up hacking it by adding little epsilons here and there.

Edited 2016-02-07: If you have single-class labels, where an object can only belong to one class, you might now consider using tf.nn.sparse_softmax_cross_entropy_with_logits so that you don't have to convert your labels to a dense one-hot array. This function was added after release 0.6.0.

User · Answer

One more thing that I would definitely like to highlight as logit is just a raw output  generally the output of last layer  This can be a negative value as well  If we use it as it s for  cross entropy  evaluation as mentioned below   -tf reduce sum y true   tf log logits     then it wont work  As log of -ve is not defined  So using o softmax activation  will overcome this problem   This is my understanding  please correct me if Im wrong

User · Answer

Whatever goes to softmax is logit  this is what J  Hinton repeats in coursera videos all the time

User · Answer

tf nn softmax computes the forward propagation through a softmax layer  You use it during evaluation of the model when you compute the probabilities that the model outputs   tf nn softmax cross entropy with logits computes the cost for a softmax layer  It is only used during training    The logits are the unnormalized log probabilities output the model  the values output before the softmax normalization is applied to them

User · Answer

Tensorflow 2 0 Compatible Answer  The explanations of dga and stackoverflowuser2010 are very detailed about Logits and the related Functions    All those functions  when used in Tensorflow 1 x will work fine  but if you migrate your code from 1 x  1 14  1 15  etc  to 2 x  2 0  2 1  etc     using those functions result in error   Hence  specifying the 2 0 Compatible Calls for all the functions  we discussed above  if we migrate from 1 x to 2 x  for the benefit of the community   Functions in 1 x    tf nn softmax  tf nn softmax cross entropy with logits tf nn sparse softmax cross entropy with logits   Respective Functions when Migrated from 1 x  to 2 x    tf compat v2 nn softmax tf compat v2 nn softmax cross entropy with logits tf compat v2 nn sparse softmax cross entropy with logits   For more information about migration from 1 x to 2 x  please refer this Migration Guide

User · Answer

Above answers have enough description for the asked question   Adding to that  Tensorflow has optimised the operation of applying the activation function then calculating cost using its own activation followed by cost functions  Hence it is a good practice to use  tf nn softmax cross entropy   over tf nn softmax    tf nn cross entropy    You can find prominent difference between them in a resource intensive model

User · Answer

Short version   Suppose you have two tensors  where y hat contains computed scores for each class  for example  from y   W x  b  and y true contains one-hot encoded true labels    y hat          Predicted label  e g  y   tf matmul X  W    b y true         True label  one-hot encoded   If you interpret the scores in y hat as unnormalized log probabilities  then they are logits   Additionally  the total cross-entropy loss computed in this manner   y hat softmax   tf nn softmax y hat  total loss   tf reduce mean -tf reduce sum y true   tf log y hat softmax    1      is essentially equivalent to the total cross-entropy loss computed with the function softmax cross entropy with logits     total loss   tf reduce mean tf nn softmax cross entropy with logits y hat  y true     Long version   In the output layer of your neural network  you will probably compute an array that contains the class scores for each of your training instances  such as from a computation y hat   W x   b  To serve as an example  below I ve created a y hat as a 2 x 3 array  where the rows correspond to the training instances and the columns correspond to classes  So here there are 2 training instances and 3 classes   import tensorflow as tf import numpy as np  sess   tf Session      Create example y hat  y hat   tf convert to tensor np array   0 5  1 5  0 1   2 2  1 3  1 7     sess run y hat    array    0 5   1 5   0 1              2 2   1 3   1 7      Note that the values are not normalized  i e  the rows don t add up to 1   In order to normalize them  we can apply the softmax function  which interprets the input as unnormalized log probabilities  aka logits  and outputs normalized linear probabilities    y hat softmax   tf nn softmax y hat  sess run y hat softmax    array    0 227863     0 61939586   0 15274114              0 49674623   0 20196195   0 30129182      It s important to fully understand what the softmax output is saying  Below I ve shown a table that more clearly represents the output above  It can be seen that  for example  the probability of training instance 1 being  Class 2  is 0 619  The class probabilities for each training instance are normalized  so the sum of each row is 1 0                         Pr Class 1   Pr Class 2   Pr Class 3                       -------------------------------------- Training instance 1   0 227863     0 61939586   0 15274114 Training instance 2   0 49674623   0 20196195   0 30129182   So now we have class probabilities for each training instance  where we can take the argmax   of each row to generate a final classification  From above  we may generate that training instance 1 belongs to  Class 2  and training instance 2 belongs to  Class 1     Are these classifications correct  We need to measure against the true labels from the training set  You will need a one-hot encoded y true array  where again the rows are training instances and columns are classes  Below I ve created an example y true one-hot array where the true label for training instance 1 is  Class 2  and the true label for training instance 2 is  Class 3    y true   tf convert to tensor np array   0 0  1 0  0 0   0 0  0 0  1 0     sess run y true    array    0    1    0               0    0    1       Is the probability distribution in y hat softmax close to the probability distribution in y true  We can use cross-entropy loss to measure the error     We can compute the cross-entropy loss on a row-wise basis and see the results  Below we can see that training instance 1 has a loss of 0 479  while training instance 2 has a higher loss of 1 200  This result makes sense because in our example above  y hat softmax showed that training instance 1 s highest probability was for  Class 2   which matches training instance 1 in y true  however  the prediction for training instance 2 showed a highest probability for  Class 1   which does not match the true class  Class 3    loss per instance 1   -tf reduce sum y true   tf log y hat softmax   reduction indices  1   sess run loss per instance 1    array   0 4790107    1 19967598     What we really want is the total loss over all the training instances  So we can compute   total loss 1   tf reduce mean -tf reduce sum y true   tf log y hat softmax   reduction indices  1    sess run total loss 1    0 83934333897877944   Using softmax cross entropy with logits    We can instead compute the total cross entropy loss using the tf nn softmax cross entropy with logits   function  as shown below    loss per instance 2   tf nn softmax cross entropy with logits y hat  y true  sess run loss per instance 2    array   0 4790107    1 19967598    total loss 2   tf reduce mean tf nn softmax cross entropy with logits y hat  y true   sess run total loss 2    0 83934333897877922   Note that total loss 1 and total loss 2 produce essentially equivalent results with some small differences in the very final digits  However  you might as well use the second approach  it takes one less line of code and accumulates less numerical error because the softmax is done for you inside of softmax cross entropy with logits

[python] What is logits, softmax and softmax_cross_entropy_with_logits?

Examples related to python

Examples related to machine-learning

Examples related to tensorflow