[tensorflow] What is the meaning of the word logits in TensorFlow?

Summary

In context of deep learning the logits layer means the layer that feeds in to softmax (or other such normalization). The output of the softmax are the probabilities for the classification task and its input is logits layer. The logits layer typically produces values from -infinity to +infinity and the softmax layer transforms it to values from 0 to 1.

Historical Context

Where does this term comes from? In 1930s and 40s, several people were trying to adapt linear regression to the problem of predicting probabilities. However linear regression produces output from -infinity to +infinity while for probabilities our desired output is 0 to 1. One way to do this is by somehow mapping the probabilities 0 to 1 to -infinity to +infinity and then use linear regression as usual. One such mapping is cumulative normal distribution that was used by Chester Ittner Bliss in 1934 and he called this "probit" model, short for "probability unit". However this function is computationally expensive while lacking some of the desirable properties for multi-class classification. In 1944 Joseph Berkson used the function log(p/(1-p)) to do this mapping and called it logit, short for "logistic unit". The term logistic regression derived from this as well.

The Confusion

Unfortunately the term logits is abused in deep learning. From pure mathematical perspective logit is a function that performs above mapping. In deep learning people started calling the layer "logits layer" that feeds in to logit function. Then people started calling the output values of this layer "logit" creating the confusion with logit the function.

TensorFlow Code

Unfortunately TensorFlow code further adds in to confusion by names like tf.nn.softmax_cross_entropy_with_logits. What does logits mean here? It just means the input of the function is supposed to be the output of last neuron layer as described above. The _with_logits suffix is redundant, confusing and pointless. Functions should be named without regards to such very specific contexts because they are simply mathematical operations that can be performed on values derived from many other domains. In fact TensorFlow has another similar function sparse_softmax_cross_entropy where they fortunately forgot to add _with_logits suffix creating inconsistency and adding in to confusion. PyTorch on the other hand simply names its function without these kind of suffixes.

Reference

The Logit/Probit lecture slides is one of the best resource to understand logit. I have also updated Wikipedia article with some of above information.

Examples related to tensorflow

Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Module 'tensorflow' has no attribute 'contrib' Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session' Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: How do I use TensorFlow GPU? Which TensorFlow and CUDA version combinations are compatible? Could not find a version that satisfies the requirement tensorflow pip3: command not found How to import keras from tf.keras in Tensorflow? Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

Examples related to machine-learning

Error in Python script "Expected 2D array, got 1D array instead:"? How to predict input image using trained model in Keras? What is the role of "Flatten" in Keras? How to concatenate two layers in keras? How to save final model using keras? scikit-learn random state in splitting dataset Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? Can anyone explain me StandardScaler? Can Keras with Tensorflow backend be forced to use CPU or GPU at will?

Examples related to neural-network

How to initialize weights in PyTorch? Keras input explanation: input_shape, units, batch_size, dim, etc What is the role of "Flatten" in Keras? How to concatenate two layers in keras? Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? How to return history of validation loss in Keras Keras model.summary() result - Understanding the # of Parameters Where do I call the BatchNormalization function in Keras? How to interpret "loss" and "accuracy" for a machine learning model

Examples related to deep-learning

How to initialize weights in PyTorch? What is the use of verbose in Keras while validating the model? How to import keras from tf.keras in Tensorflow? Keras input explanation: input_shape, units, batch_size, dim, etc Pytorch reshape tensor dimension What is the role of "Flatten" in Keras? Best way to save a trained model in PyTorch? Update TensorFlow Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? Keras, How to get the output of each layer?

Examples related to cross-entropy

What is the meaning of the word logits in TensorFlow?