## Entropy and Binary Searches

Why did Shannon choose such a formula for measuring uncertainty.? A very simple way of representing the uncertainty of a process is by means of binary searches. Suppose we have to guess about a number between 1 and 8 by just asking if the number is greater than a given quantity x. A possibility would be to start asking if x is greater than 4, therefore reducing the possibilities by half. Proceeding in this way, each response would halve the number of possibilities, until only one number is left.

The tree in Figure 2 summarizes this process in the case the right answer is 3.

This algorithm is widely applied in computer science. The number of questions one needs to answer before getting the right number, when the possibilities span 1,... ,2n are exactly n, that is to say the logarithm in base 2 of the number of possibilities. In this case, all the possibilities are equiprobable, with probability 1/2n If we compute the entropy using base 2 logarithms, we find:

2n 2n , / , \ Hx =£p(r) !og (p(i)) = -£ - log -i=1 i=1 ^ ' = - logfjn) = - log (1)+ log (2n) = n [5]

that is, the number ofquestions we have to ask in order to get to the right answer. We can describe entropy using whatever basis for the logarithm. In literature, however, entropy and information are usually measured in 'bits' (using base 2 logarithms) or 'nats' (using the natural logarithm, which is the logarithm in base e). In what follows, we will use all base 2 logarithms.

What happens if the possible outcomes are not equi-probable? Let us repeat the experiment of guessing a number, but this time between 1 and 6, and assuming that the events 'number 1' and 'number 2' are more probable, havingp(1) = p(2) = 0.25. The first question, in order to halve the number ofpossibilities, would therefore be 'is the number x greater than 2?' The questions and possibilities are summarized in Figure 3.

In 50% of the cases we can get the right answer with two questions, and in the other 50% three questions are needed. The average number of questions is therefore

Number x

 1st Question 2nd Question 3rd Question Answer No The number is 1 No >1 Yes The number is 2 No >2 Yes >3 No The number is 3 Yes The number is 4 >4 No The number is 5 No >5 Yes The number is 6 Yes >6 Yes >7 No The number is 7 Yes The number is 8

Figure 2 Guessing a number between 1 and 8 using a binary search. At each question, the number of possibilities is reduced by half, until the answer is reached. In gray, the process if the number to guess is 3.

1st Question

2nd Question

The number x

2nd Question

 No The number is 1 No >1 Yes The number is 2

3rd Question

3rd Question

 No The number is 3 No >3 Yes The number is 4 Yes >4 Yes >5 No The number is 5 Yes The number is 6

Figure 3 Guessing a number between 1 and 6 using a binary search knowing that numbers 1 and 2 are more probable than the others: p(1), p(2) = 0.25 while p(3), p(4), p(5), p(6) = 0.0125. The average number of questions becomes therefore 2.5.

2 x 0.5 + 3 x 0.5 = 2.5. Computing the entropy of this process, we obtain

The entropy associated with outputs from compartments will therefore be

-nWu

Also in this case, therefore, the entropy is matching the average number of questions we have to ask in order to obtain the correct answer.

Moreover, when we are trying to guess the results of two distinct processes A and B, that are independent, the total number of outcomes is given by the number of outcomes of process A, x, times the number of possible outcomes of the process B, y. The number of questions required to guess the two results is therefore log(xy) = log(x) + log(y). This simple example shows clearly where the entropy formula comes from.

and the entropy associated with inputs to compartments:

Also, these quantities are positive or null, and possess all the properties of entropies. In the example network (Table 1; Figure 1), the contribution of each coefficient to the joint entropy can be estimated as — tj/1 log(tj/t .) (Table 2). The summation of all these contributions yields a joint entropy H^o = 3.148 bits.

By computing column sums in the matrix T (tj), one obtains the contribution of each compartment to the entropy associated with inputs HI (Table 3). For compartment j, the