## Probability distributions

In a deterministic world everything would be predictable. If speciation rates were deterministic we would be able to predict exactly the number of species at time t + 1 given the numbers of species at time t and a knowledge of the underlying processes governing speciation. This notion of a deterministic and therefore predictable world is upset by two important phenomena. First, and most obviously, many environmental phenomena are not deterministic! Randomly occurring, or generally unpredictable, events make an important contribution to ecological and evolutionary processes. In these cases we use the term stochastic. The second issue is that, even when processes are deterministic, the results may appear to be random. Thus chaotic phenomena, generated by strictly deterministic processes, produce apparently random output (Chapter 5).

Many unpredictable phenomena have a set of possible outcomes. In some cases there may be only two possibilities, such as whether or not it rains on a given day. Similarly, we may consider whether or not a species will go extinct in a given time period. Other phenomena will have more than two outcomes. The probability of a particular outcome can be determined based on considerations of different temporal or spatial scales. The probability that it rains tomorrow could be judged on how many days it has rained in the last month; for example, 28 out of 30 days. We might wish to contrast this probability (28/30) with that of equal probability (1/2) that it rains or does not. The much higher probability of rain during that month may indicate that we are in a wet season or simply an area of high rainfall.

Let us assume that the probability that it rains on a given day is 0.75 based on past events over several years. This might suggest that we can predict the weather (!) but we cannot be certain whether it will rain on a given day. In fact, we have a probability of 0.75 that it rains on a given day and 0.25 that it does not rain. Assuming that rain today is not affected by rain yesterday -that is, that rain on a given day is independent of rain on another day (this is an important assumption and one we will modify later) - we can generate a binomial distribution of events as follows.

Over 2 days, the probability that it rains on both days is 0.75 x 0.75; the probability that it rains on one day is 0.75 x 0.25 (x 2 as it can happen in either order) and the probability that it does not rain on both days is 0.25 x 0.25. The distribution of probabilities that it will rain or not over different numbers of days builds in the following manner:

2 days: 0.25 x 0.25 (no rain), 2 x 0.75 x 0.25 (rain on 1 day), 0.75 x 0.75 (rain on both days);

3 days: 0.25 x 0.25 x 0.25 (no rain), 3 x 0.75 x 0.25 x 0.25 (rain on 1 day), 3 x 0.75 x 0.75 x 0.25 (rain on 2 days), 0.75 x 0.75 x 0.75 (rain on 3 days).

In each case the probabilities sum to 1 (day 1, 0.25 + 0.75 = 1; day 2, 0.0625 + 0.375 + 0.5625 = 1 and so on). The distribution of probabilities rapidly becomes complex as the number of days increases, even though we are only dealing with two events (rain or not). For this reason statisticians have devised shorthand algebra to summarize the probability distributions. In the case of the binomial distribution, let p equal the probability of one event and q equal the probability of the other (p + q = 1). If n is the number of days we can determine the probabilities of rain on zero up to n days by expansion of (p + q)n; the following are expansions for n = 1-4:

p + q p2 + 2pq + q2 p3 + 3 p2q + 3 q2p + q3 p4 + 4p3q + 6 p2q2 + 4pq3 + q4

Notice that the coefficients (the number of p and/or q combinations) increases in a predictable manner, this is known as Pascal's triangle:

Here each coefficient is the sum of the above two in the previous row. The coefficient can be generalized by a formula, n!/(s!(n - s)!) where s is the number of events with probability p and n! is n factorial. n factorial means that the integers from n to 1 are multiplied; for example, 3! is 3 x 2 x 1 = 6. So, the coefficient for three rainy days out of four is:

4!/(3!(4 - 3)!) = 4 x 3 x 2 x 1/(3 x 2 x 1(4 - 3)!) = 4

p = q = 0.5 is an example of a uniform distribution, which also occurs for more than two outcomes; for example, in the roll of a die where the values 1-6 have an equal probability of occurring (1/6).

A probability density function (pdf) is a set of mathematical statements that tell us the probability that a variable will take a given value. The sum of probabilities in a pdf is 1. Pdfs can be discrete, such as the binomial example

Fig. 3.1 Areas under the normal probability density function, showing the percentage of events occurring between one, two or three standard deviations (c) either side of the mean (|).

Fig. 3.1 Areas under the normal probability density function, showing the percentage of events occurring between one, two or three standard deviations (c) either side of the mean (|).

or the uniform distribution of rolls of a die, or continuous, such as the normal distribution (Fig. 3.1). For a continuous distribution we cannot say that a variable will have a certain value but instead we say that it can lie between different values with a certain probability. For a normal distribution, the probability that a variable will lie between one standard deviation either side of the mean is 0.6827 whereas the probability of it lying within two standard deviations either side of the mean is 0.9544 (Fig. 3.1). So, if a variable is normally distributed we expect 68.27% of the values to lie within one standard deviation of the mean.

A process by which events occur at random in space or time is known as a Poisson process. The distribution of those events - the number of events occurring per unit of time or space - is described by the Poisson distribution. The Poisson distribution is an example of a discrete pdf as it is concerned with counts of events. A Poisson process is recognized by its properties of homogeneity and independence. By homogeneity, we mean that the probability of an event occurring per unit time or space remains constant. The assumptions of independence and homogeneity mean that the Poisson distribution is a useful null model in ecology and evolution. For example, we might hypothesize that the distribution of plants in a field are clumped or aggregated because the plant reproduces asexually from its roots. This hypothesis can be tested against the null model of random distribution in space which can be modelled with the Poisson distribution. If the mean number of plants per square metre is given as x, then the terms of the Poisson distribution are:

The first term gives the probability of 1 m2 of ground containing zero plants, the second term gives the probability of 1 m2 containing one plant and so on.

The fact that the terms can be summed to 1 means that we can determine the probability that a square metre contains at least one plant by calculating 1 - e~x. Note that the Poisson distribution is concerned with relatively rare events. In this case, it requires that the mean number of plants (x) per square metre is small compared with the maximum possible number of plants in that area. The number of samples predicted to contain 0, 1, 2, 3, etc. plants can be found by multiplying the probabilities in the Poisson terms by the total number of samples. The observed distribution can then be tested against this predicted Poisson distribution. A suitable significance test can determine whether this is just chance or a significant departure from random. Note that inspection of the data is important here as the distribution could depart from random but be regular rather than clumped. The same principles of null hypothesis testing apply to clumping in time.

An alternative to testing for clumping against a Poisson process is to find a distribution that assumes a clumped distribution. The negative binomial is an example of such a distribution, with an extra parameter k which reflects the degree of clumping. As k increases, the negative binomial approaches the Poisson distribution.

Any set of environmental dynamics is likely to be composed of deterministic and stochastic elements. A major issue in modelling is to tease apart these two elements and determine their relative importance. The regression analogy is helpful here in that we seek to quantify the relative amount of explained (deterministic) and unexplained (stochastic) variation: these two components are sometimes referred to as the signal and the noise. Just as there may be several components of the deterministic variation (as revealed by multiple regression) the unexplained variation may have several sources. In population dynamics the unexplained variance is composed of extrinsic random events (environmental stochasticity), variation between individuals in survival and fecundity, sampling error and non-significant deterministic factors. Although the overall levels of variation in survival and fecundity may be predictable, in smaller populations they combine with sampling error to generate essentially random mixes of individuals, a phenomenon termed demographic stochasticity. In this chapter we will consider how to begin modelling stochastic events.