Uniformly distributed data (high entropy):
s=range(0,256)
Shannon entropy calculation step by step:
import collections
import math
# calculate probability for each byte as number of occurrences / array length
probabilities = [n_x/len(s) for x,n_x in collections.Counter(s).items()]
# [0.00390625, 0.00390625, 0.00390625, ...]
# calculate per-character entropy fractions
e_x = [-p_x*math.log(p_x,2) for p_x in probabilities]
# [0.03125, 0.03125, 0.03125, ...]
# sum fractions to obtain Shannon entropy
entropy = sum(e_x)
>>> entropy
8.0
One-liner (assuming import collections
):
def H(s): return sum([-p_x*math.log(p_x,2) for p_x in [n_x/len(s) for x,n_x in collections.Counter(s).items()]])
A proper function:
import collections
import math
def H(s):
probabilities = [n_x/len(s) for x,n_x in collections.Counter(s).items()]
e_x = [-p_x*math.log(p_x,2) for p_x in probabilities]
return sum(e_x)
Test cases - English text taken from CyberChef entropy estimator:
>>> H(range(0,256))
8.0
>>> H(range(0,64))
6.0
>>> H(range(0,128))
7.0
>>> H([0,1])
1.0
>>> H('Standard English text usually falls somewhere between 3.5 and 5')
4.228788210509104