Binary entropy function

Entropy of a process with only two probable values
Entropy of a Bernoulli trial as a function of binary outcome probability, called the binary entropy function.

In information theory, the binary entropy function, denoted H ( p ) {\displaystyle \operatorname {H} (p)} or H b ( p ) {\displaystyle \operatorname {H} _{\text{b}}(p)} , is defined as the entropy of a Bernoulli process with probability p {\displaystyle p} of one of two values. It is a special case of H ( X ) {\displaystyle \mathrm {H} (X)} , the entropy function. Mathematically, the Bernoulli trial is modelled as a random variable X {\displaystyle X} that can take on only two values: 0 and 1, which are mutually exclusive and exhaustive.

If Pr ( X = 1 ) = p {\displaystyle \operatorname {Pr} (X=1)=p} , then Pr ( X = 0 ) = 1 p {\displaystyle \operatorname {Pr} (X=0)=1-p} and the entropy of X {\displaystyle X} (in shannons) is given by

H ( X ) = H b ( p ) = p log 2 p ( 1 p ) log 2 ( 1 p ) {\displaystyle \operatorname {H} (X)=\operatorname {H} _{\text{b}}(p)=-p\log _{2}p-(1-p)\log _{2}(1-p)} ,

where 0 log 2 0 {\displaystyle 0\log _{2}0} is taken to be 0. The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See binary logarithm.

When p = 1 2 {\displaystyle p={\tfrac {1}{2}}} , the binary entropy function attains its maximum value. This is the case of an unbiased coin flip.

H ( p ) {\displaystyle \operatorname {H} (p)} is distinguished from the entropy function H ( X ) {\displaystyle \mathrm {H} (X)} in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variable as a parameter. Sometimes the binary entropy function is also written as H 2 ( p ) {\displaystyle \operatorname {H} _{2}(p)} . However, it is different from and should not be confused with the Rényi entropy, which is denoted as H 2 ( X ) {\displaystyle \mathrm {H} _{2}(X)} .

Explanation

In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose p = 0 {\displaystyle p=0} . At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If p = 1 {\displaystyle p=1} , the result is again certain, so the entropy is 0 here as well. When p = 1 / 2 {\displaystyle p=1/2} , the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if p = 1 / 4 {\displaystyle p=1/4} , there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.

Derivative

The derivative of the binary entropy function may be expressed as the negative of the logit function:

d d p H b ( p ) = logit 2 ( p ) = log 2 ( p 1 p ) {\displaystyle {d \over dp}\operatorname {H} _{\text{b}}(p)=-\operatorname {logit} _{2}(p)=-\log _{2}\left({\frac {p}{1-p}}\right)} .
d 2 d p 2 H b ( p ) = 1 p ( 1 p ) ln 2 {\displaystyle {d^{2} \over dp^{2}}\operatorname {H} _{\text{b}}(p)=-{\frac {1}{p(1-p)\ln 2}}}

Taylor series

The Taylor series of the binary entropy function in a neighborhood of 1/2 is

H b ( p ) = 1 1 2 ln 2 n = 1 ( 1 2 p ) 2 n n ( 2 n 1 ) {\displaystyle \operatorname {H} _{\text{b}}(p)=1-{\frac {1}{2\ln 2}}\sum _{n=1}^{\infty }{\frac {(1-2p)^{2n}}{n(2n-1)}}}

for 0 p 1 {\displaystyle 0\leq p\leq 1} .

Bounds

The following bounds hold for 0 < p < 1 {\displaystyle 0<p<1} :[1]

ln ( 2 ) log 2 ( p ) log 2 ( 1 p ) H b ( p ) log 2 ( p ) log 2 ( 1 p ) {\displaystyle \ln(2)\cdot \log _{2}(p)\cdot \log _{2}(1-p)\leq H_{\text{b}}(p)\leq \log _{2}(p)\cdot \log _{2}(1-p)}

and

4 p ( 1 p ) H b ( p ) ( 4 p ( 1 p ) ) ( 1 / ln 4 ) {\displaystyle 4p(1-p)\leq H_{\text{b}}(p)\leq (4p(1-p))^{(1/\ln 4)}}

where ln {\displaystyle \ln } denotes natural logarithm.

See also

  • Metric entropy
  • Information theory
  • Information entropy
  • Quantities of information

References

  1. ^ Topsøe, Flemming (2001). "Bounds for entropy and divergence for distributions over a two-element set". JIPAM. Journal of Inequalities in Pure & Applied Mathematics. 2 (2): Paper No. 25, 13 p.-Paper No. 25, 13 p.

Further reading

  • MacKay, David J. C. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1