[math] What is "entropy and information gain"?

To begin with, it would be best to understand the measure of information.

How do we measure the information?

When something unlikely happens, we say it's a big news. Also, when we say something predictable, it's not really interesting. So to quantify this interesting-ness, the function should satisfy

  • if the probability of the event is 1 (predictable), then the function gives 0
  • if the probability of the event is close to 0, then the function should give high number
  • if probability 0.5 events happens it give one bit of information.

One natural measure that satisfy the constraints is

I(X) = -log_2(p)

where p is the probability of the event X. And the unit is in bit, the same bit computer uses. 0 or 1.

Example 1

Fair coin flip :

How much information can we get from one coin flip?

Answer : -log(p) = -log(1/2) = 1 (bit)

Example 2

If a meteor strikes the Earth tomorrow, p=2^{-22} then we can get 22 bits of information.

If the Sun rises tomorrow, p ~ 1 then it is 0 bit of information.

Entropy

So if we take expectation on the interesting-ness of an event Y, then it is the entropy. i.e. entropy is an expected value of the interesting-ness of an event.

H(Y) = E[ I(Y)]

More formally, the entropy is the expected number of bits of an event.

Example

Y = 1 : an event X occurs with probability p

Y = 0 : an event X does not occur with probability 1-p

H(Y) = E[I(Y)] = p I(Y==1) + (1-p) I(Y==0) 
     = - p log p - (1-p) log (1-p)

Log base 2 for all log.

Examples related to math

How to do perspective fixing? How to pad a string with leading zeros in Python 3 How can I use "e" (Euler's number) and power operation in python 2.7 numpy max vs amax vs maximum Efficiently getting all divisors of a given number Using atan2 to find angle between two vectors How to calculate percentage when old value is ZERO Finding square root without using sqrt function? Exponentiation in Python - should I prefer ** operator instead of math.pow and math.sqrt? How do I get the total number of unique pairs of a set in the database?

Examples related to text

Difference between opening a file in binary vs text How do I center text vertically and horizontally in Flutter? How to `wget` a list of URLs in a text file? Convert txt to csv python script Reading local text file into a JavaScript array Python: How to increase/reduce the fontsize of x and y tick labels? How can I insert a line break into a <Text> component in React Native? How to split large text file in windows? Copy text from nano editor to shell Atom menu is missing. How do I re-enable

Examples related to computer-science

HTML5 Canvas background image What exactly does big ? notation represent? Fixed point vs Floating point number What are the differences between a program and an application? What do we mean by Byte array? How to determine the longest increasing subsequence using dynamic programming? What is "entropy and information gain"? What are the differences between NP, NP-Complete and NP-Hard? What is the difference between statically typed and dynamically typed languages? What is “2's Complement”?

Examples related to nltk

re.sub erroring with "Expected string or bytes-like object" how to check which version of nltk, scikit learn installed? Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) NLTK and Stopwords Fail #lookuperror Resource u'tokenizers/punkt/english.pickle' not found How do I download NLTK data? Stopword removal with NLTK n-grams in python, four, five, six grams? pip issue installing almost any library How to get rid of punctuation using NLTK tokenizer?

Examples related to text-mining

How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern? What is "entropy and information gain"?