What is the meaning of the word logits in TensorFlow

Question

In the following TensorFlow function  we must feed the activation of artificial neurons in the final layer  That I understand  But I don t understand why it is called logits  Isn t that a mathematical function    loss function   tf nn softmax cross entropy with logits       logits   last layer       labels   target output

User · Answer

Just adding this clarification so that anyone who scrolls down this much can at least gets it right, since there are so many wrong answers upvoted.

Diansheng's answer and JakeJ's answer get it right.
A new answer posted by Shital Shah is an even better and more complete answer.

Yes, logit as a mathematical function in statistics, but the logit used in context of neural networks is different. Statistical logit doesn't even make any sense here.

I couldn't find a formal definition anywhere, but logit basically means:

The raw predictions which come out of the last layer of the neural network.
1. This is the very tensor on which you apply the argmax function to get the predicted class.
2. This is the very tensor which you feed into the softmax function to get the probabilities for the predicted classes.

Also, from a tutorial on official tensorflow website:

Logits Layer

The final layer in our neural network is the logits layer, which will return the raw values for our predictions. We create a dense layer with 10 neurons (one for each target class 0–9), with linear activation (the default):
logits = tf.layers.dense(inputs=dropout, units=10)

If you are still confused, the situation is like this:

raw_predictions = neural_net(input_layer)
predicted_class_index_by_raw = argmax(raw_predictions)
probabilities = softmax(raw_predictions)
predicted_class_index_by_prob = argmax(probabilities)

where, predicted_class_index_by_raw and predicted_class_index_by_prob will be equal.

Another name for raw_predictions in the above code is logit.

~~As for the why logit... I have no idea. Sorry.~~
[Edit: See this answer for the historical motivations behind the term.]

Trivia

Although, if you want to, you can apply statistical logit to probabilities that come out of the softmax function.

If the probability of a certain class is p,
Then the log-odds of that class is L = logit(p).

Also, the probability of that class can be recovered as p = sigmoid(L), using the sigmoid function.

Not very useful to calculate log-odds though.

User · Answer

logits  The vector of raw  non-normalized  predictions that a classification model generates  which is ordinarily then passed to a normalization function  If the model is solving a multi-class classification problem  logits typically become an input to the softmax function  The softmax function then generates a vector of  normalized  probabilities with one value for each possible class   In addition  logits sometimes refer to the element-wise inverse of the sigmoid function  For more information  see tf nn sigmoid cross entropy with logits   official tensorflow documentation

User · Answer

FOMOsapiens   If you check math Logit function  it converts real space from  0 1  interval to infinity  -inf  inf   Sigmoid and softmax will do exactly the opposite thing  They will convert the  -inf  inf  real space to  0  1  real space  This is why  in machine learning we may use logit before sigmoid and softmax function  since they match   And this is why  quot we may call quot  anything in machine learning that goes in front of sigmoid or softmax function the logit  Here is J  Hinton video using this term

User · Answer

Summary  In context of deep learning the logits layer means the layer that feeds in to softmax  or other such normalization   The output of the softmax are the probabilities for the classification task and its input is logits layer  The logits layer typically produces values from -infinity to  infinity and the softmax layer transforms it to values from 0 to 1   Historical Context  Where does this term comes from  In 1930s and 40s  several people were trying to adapt linear regression to the problem of predicting probabilities  However linear regression produces output from -infinity to  infinity while for probabilities our desired output is 0 to 1  One way to do this is by somehow mapping the probabilities 0 to 1 to -infinity to  infinity and then use linear regression as usual  One such mapping is cumulative normal distribution that was used by Chester Ittner Bliss in 1934 and he called this  probit  model  short for  probability unit   However this function is computationally expensive while lacking some of the desirable properties for multi-class classification  In 1944 Joseph Berkson used the function log p  1-p   to do this mapping and called it logit  short for  logistic unit   The term logistic regression derived from this as well   The Confusion  Unfortunately the term logits is abused in deep learning  From pure mathematical perspective logit is a function that performs above mapping  In deep learning people started calling the layer  logits layer  that feeds in to logit function  Then people started calling the output values of this layer  logit  creating the confusion with logit the function   TensorFlow Code  Unfortunately TensorFlow code further adds in to confusion by names like tf nn softmax cross entropy with logits  What does logits mean here  It just means the input of the function is supposed to be the output of last neuron layer as described above  The  with logits suffix is redundant  confusing and pointless  Functions should be named without regards to such very specific contexts because they are simply mathematical operations that can be performed on values derived from many other domains  In fact TensorFlow has another similar function sparse softmax cross entropy where they fortunately forgot to add  with logits suffix creating inconsistency and adding in to confusion  PyTorch on the other hand simply names its function without these kind of suffixes   Reference  The Logit Probit lecture slides is one of the best resource to understand logit  I have also updated Wikipedia article with some of above information

User · Answer

Logits often are the values of Z function of the output layer in Tensorflow

User · Answer

The logit    lo d  t  LOH-jit  function is the inverse of the sigmoidal  logistic  function or logistic transform used in mathematics  especially in statistics  When the function s variable represents a probability p  the logit function gives the log-odds  or the logarithm of the odds p  1 - p     See here  https   en wikipedia org wiki Logit

User · Answer

Here is a concise answer for future readers  Tensorflow s logit is defined as the output of a neuron without applying activation function   logit   w x   b    x  input  w  weight  b  bias  That s it     The following is irrelevant to this question   For historical lectures  read other answers  Hats off to Tensorflow s  creatively  confusing naming convention  In PyTorch  there is only one CrossEntropyLoss and it accepts un-activated outputs  Convolutions  matrix multiplications and activations are same level operations  The design is much more modular and less confusing  This is one of the reasons why I switched from Tensorflow to PyTorch

User · Answer

Logit is a function that maps probabilities  0  1  to  -inf   inf     Softmax is a function that maps  -inf   inf  to  0  1  similar as Sigmoid  But Softmax also normalizes the sum of the values output vector  to be 1   Tensorflow  with logit   It means that you are applying a softmax function to logit numbers to normalize it  The input vector logit is not normalized and can scale from  -inf  inf     This normalization is used for multiclass classification problems  And for multilabel classification problems sigmoid normalization is used i e  tf nn sigmoid cross entropy with logits

User · Answer

They are basically the fullest learned model you can get from the network  before it s been squashed down to apply to only the number of classes we are interested in   Check out how some researchers use them to train a shallow neural net based on what a deep network has learned   https   arxiv org pdf 1312 6184 pdf  It s kind of like how when learning a subject in detail  you will learn a great many minor points  but then when teaching a student  you will try to compress it to the simplest case   If the student now tried to teach  it d be quite difficult  but would be able to describe it just well enough to use the language

User · Answer

Personal understanding  in TensorFlow domain  logits are the values to be used as input to softmax  I came to this understanding based on this tensorflow tutorial   https   www tensorflow org tutorials layers    Although it is true that logit is a function in maths especially in statistics   I don t think that s the same  logit  you are looking at  In the book Deep Learning by Ian Goodfellow  he mentioned      The function s-1 x  is called the logit in statistics  but this term   is more rarely used in machine learning  s-1 x  stands for the   inverse function of logistic sigmoid function    In TensorFlow  it is frequently seen as the name of last layer  In Chapter 10 of the book Hands-on Machine Learning with Scikit-learn and TensorFLow by Aur  lien G  ron  I came across this paragraph  which stated logits layer clearly      note that logits is the output of the neural network before going   through the softmax activation function  for optimization reasons  we   will handle the softmax computation later    That is to say  although we use softmax as the activation function in the last layer in our design  for ease of computation  we take out logits separately  This is because it is more efficient to calculate softmax and cross-entropy loss together  Remember that cross-entropy is a cost function  not used in forward propagation

User · Answer

Logits is an overloaded term which can mean many different things     In Math  Logit is a function that maps probabilities   0  1   to R   -inf  inf      Probability of 0 5 corresponds to a logit of 0  Negative logit correspond to probabilities less than 0 5  positive to   0 5   In ML  it can be      the vector of raw  non-normalized  predictions that a classification   model generates  which is ordinarily then passed to a normalization   function  If the model is solving a multi-class classification   problem  logits typically become an input to the softmax function  The   softmax function then generates a vector of  normalized  probabilities   with one value for each possible class    Logits also sometimes refer to the element-wise inverse of the sigmoid function

[tensorflow] What is the meaning of the word logits in TensorFlow?

Logits Layer

Trivia

Examples related to tensorflow

Examples related to machine-learning

Examples related to neural-network

Examples related to deep-learning

Examples related to cross-entropy