## The system as a text and its information entropy

Ludwig von Wittgenstein said that any physical object or system could be represented as a text, written in a special language with a proper alphabet and grammar.

Let the text be a single word with length N. If the alphabet contains n symbols (for English n = 29:26 letters, one blank, one comma and one full stop), then every symbol is repeated in the word Nl, N2,..., Nn times (Xn=i Ni = N). The total number of different words from N symbols of this n-symbols language is equal to W = N!/YYn=n N¿!- Then the total information contained in the word (text) is equal to I = Pi log2 Pi (in bits) where

Pi = Ni/N. Comparing this formula with Eq. (1.2) we see that they coincide (to within units of measurement). The specific information, i.e. the information per symbol, will be equal to n

This is the so-called Shannon measure of information, or the information entropy (Shannon and Weaver, 1963; Pierce, 1980). Since receiving information reduces uncertainty, the Shannon concept can be formulated as

Information per symbol = mean value of uncertainty per symbol.

We prove how changing the level of uncertainty leads to a gain in information. Let us pass from the distribution of probability p0 = {pi, ...,pfy to the distribution p = {Pl,...,pn}. Since the distribution of probability changes, the uncertainty changes too. How can we estimate the change of information that results from this transition? If the probability of some event changes from P0 to 1, the change of information is equal to SI = log2(l/p0) = —log2 p0, if p0 ! p then SI = log2(p/p0). If the entire distribution p0 ! p, then the change of information is equal to the sum of partial changes SIi = log2(pi/pf) multiplied by the finite probabilities pt:

This value, also called Kullback's measure for the increment of information (Kullback, 1959), is always positive (it is equal to zero only if p0 = p); therefore, in this case we can say about a gain in information. So, knowledge about the transition p0! p decreases uncertainty and gives the gain of information. Later on, Kullback's measure SI will be denominated as K = K(p0, p).

In the Shannon concept symbols (letters) are considered as primary elements of a language. However, a text can consist of separate words, so that the words (not letters) could as such be considered primary elements. For instance, if the alphabet contains n letters then nr words of r letters could be constructed under the alphabet. If p ij.. ,k is the probability of formation of the r-words then the information entropy of rth order will be equal to (Yaglom and Yaglom, 1973)

It is obvious that I(l) = I, i.e. Shannon's entropy. Under the assumption that the source of information is stationary and generates an ergodic Markov sequence,

Khinchin (1953, 1957) has proved that

Let us consider a text which is written in the English language. At zero level of perception (or description) we know only the number of symbols (n = 29). Then the information per symbol I(0) = log229 < 4.85 bits. At the first level of perception we take into account the frequencies of the symbols (letters); then T(1) < 4.03 bits. At the next levels, when we take into account double, triple, etc. correlations, i.e. words of two letters, three letters, etc., we get the following values of information per symbol (Ebeling et al., 1990):