# Information and entropy

Chapters 1 and 2 have shown that the ecological information available about the objects under study is usually (or may be reformulated as) a set of biological and/or physical characteristics, which correspond to as many descriptors. Searching for groups of descriptors that behave similarly across the set of objects, or that may be used to forecast one another (R analysis, Section 7.1), requires measuring the amount of information that these descriptors have in common. In the simplest case of two descriptors a and b (called y1 and y2 in previous chapters), one must assess how much information is provided by the distribution of the objects among the states of a, that could be used to forecast their distribution among the states of b. This approach is central to the analysis of relationships among ecological phenomena.

In 1968, Ludwig von Bertalanffy wrote, in his General System Theory (p. 32): "Thus, there exist models, principles, and laws that apply to generalized systems or their subclasses, irrespective of their particular kind, the nature of their component elements, and the relations or 'forces' between them". This is the case with information, which can be viewed and measured in the same manner for all systems. Some authors, including Pielou (1975), think that the concepts derived from information theory are, in ecology, a model and not a homology. Notwithstanding this opinion, the following sections will discuss how to measure information for biological descriptors in terms of information to be acquired, because such a presentation provides a better understanding of the nature of information in ecological systems.

The problem thus consists in measuring the amount of information contained in each descriptor and, further, the amount of information that two (or several) descriptors have in common. If, for example, two descriptors share 100% of their information, then they obviously carry the same information. Since descriptors are constructed so as to partition the objects under study into a number of states, two descriptors have 100% of their information in common when they partition a set of objects in exactly the same way, i.e. into two equal and corresponding sets of states. When descriptors are qualitative, this correspondence does not need to follow any ordering of the states of the two descriptors. For ordered descriptors, the ordering of the correspondence between states is important and the techniques for analysing the information in common belong to correlation analysis (Chapters 4 and 5).

Entropy The mathematical theory of information is based on the concept of entropy. Its mathematical formulation was developed by Shannon (Bell Laboratories) who proposed, in 1948, the well-known equation:

where H is a measure of the uncertainty or choice associated with a frequency distribution (vector) p; pi is the probability that an observation belongs to state i of the descriptor (Fig. 1.1). In practice, pi is the proportion (or relative frequency, on a 0-1 scale) of observations in state i. Shannon recognized that his equation is similar to the equation of entropy, published in 1898 by the physicist Boltzmann as a quantitative formulation of the second law of thermodynamics, which concerns the degree of disorganization in closed physical systems. He thus concluded that H corresponds to the entropy of information systems.

Table 6.1 Contingency table (numerical example). Distribution of 120 objects on descriptors a and b.

b1 b2 b3 b4

30 30 30 30

Negative entropy

### Information

Numerical example. In order to facilitate the understanding of the presentation up to Section 6.4, a small numerical example will be used in which 120 objects are described by two descriptors (a and b) with 4 states each. The question is to determine to what extent one descriptor can be used to forecast the other. The data in the numerical example could be, for example, the benthos sampled at 120 sites of an estuary, or the trees observed in 120 vegetation quadrats. Descriptor a might be the dominant species at each sampling site and descriptor b, some environmental variable with 4 states. The following discussion is valid for any type of qualitative descriptor and also for ordered descriptors divided into classes.

Assume that the 120 observations are distributed as 60, 30, 15 and 15 among the 4 states of descriptor a and that there are 30 observations in each of the 4 states of descriptor b. The states of the observations (objects), for the two descriptors combined, are given in Table 6.1.

For each descriptor, the probability of a state is estimated by the relative frequency with which the state is found in the set of observations. Thus, the probability distributions associated with descriptors a and b are:

Note that the entropy of information theory is actually the negative entropy of physicists. In thermodynamics, an increase in entropy corresponds to an increase in disorder, which is accompanied by a decrease of information. Strictly speaking, information is negative entropy and it is only for convenience that it is simply called entropy. In information theory, entropy and information are taken as synonymous.