## Random walks and evolution

3.2.1 Generating a random walk

A random walk proceeds from a start point and travels a fixed distance from one point to the next in a random direction (no one direction is more likely than another; Fig. 3.2). Random walks can be in space or time. Both of these are useful in developing null models in ecology and evolution. For example,

Fig. 3.2 The result of a random-walk model in two dimensions. The walk starts at (0,0) and proceeds a fixed distance of 1 unit in a random direction. Note the difference in scale of the x and y directions.

a study of foraging in insects may wish to compare observed foraging distances with those generated from a random walk. Similarly, we may contrast the evolutionary change through time in a particular character or the number of species against that expected from random.

Let us imagine a random walk as a model of change in number of species over time. We start at time 0 with 100 species. At each time step the number of species can increase or decrease. For a purely random model we would assume that the mean increase is 0 (there will be situations in which we will want to change this assumption, giving clades or populations net rates of increase or decrease). Therefore, at each step in the time series we choose a value at random from a probability distribution. An important consideration is whether the model is additive or multiplicative. This also applies to similar models of change in population size over time (Lewontin & Cohen 1969). In a simple additive model the number of species (or individuals in the case of a population model) to be added is independent of the number of species or individuals at that time. This is likely to be an oversimplification. A better model is that used in equation 2.2; that is, that the number of species or individuals in the next time period is a multiple of the number now:

If we want no mean change in size with time X is set to 1. In the new model we include a variable £ that can take values at random:

As the model is multiplicative it is best to transform to log values so that increases and decreases are displayed as equal values. For example, in a multiplicative model multiplying by 10 and multiplying by 1/10 are equal but opposite. To express these multiples as the same size (magnitude) they are transformed to log values; for example, log10 10 = 1 and log10 1/10 = -1. The new model (equation 3.2) can be expressed with log values:

Using this transformation will also help when we come to estimate parameter values with linear regression techniques. Mean increase in the multiplicative model over time is seen to be 0 when using log values.

Consider an example in which e is a value taken at random from a normal distribution with a mean of 0 and standard deviation of 1 (Fig. 3.3a). We can use a series of these random numbers to generate a time series with different values of X. In this case we illustrate X = 1 (Fig. 3.3b). Notice the relatively smooth shape of this walk compared with the spiky nature of the random values (Fig. 3.3c).

The contrast between the two time series in Figs 3.3b and 3.3 c can be revealed by correlation of the numbers at time t + 1 with those at time t (Fig. 3.4). This shows that there is a strong positive correlation for the random-walk model (Fig. 3.4a) compared with no correlation between the random time series (Fig. 3.4b). This is correct because the random numbers are drawn independently from a normal distribution whereas the speciation model (equation 3.1 or 3.2) makes Nt+i a function of Nt. Correlations between sets of time series data are referred to as autocorrelations. In this case we have considered an autocorrelation of lag 1; that is, a difference of one time step. Autocorrelations with lags of more than one can also be studied and may be expected to occur when different species interact (Chapter 6). In general, autocorrelation is a useful technique for starting to explore signals in a time series with some level of stochasticity.

Random walks have been used as null models in studies of change in marine fossil diversity with time (Cornette & Lieberman 2004). This study made use of Sepkoski's compendium of fossil marine genera and showed that changes in diversity over the last 540 Myr are consistent with a random walk. This does not necessarily mean that the underlying processes are stochastic, but that the net result of the processes causing change appears to be random. Indeed, the same data set has also been used to address periodicity in the fossil record. Most famously, it was the basis of Raup and Sepkoski's analysis that led to the idea of mass extinction events, such as the end-Permian and end-Cretaceous (Raup & Sepkoski 1982), which, in turn, were linked with periodicity of approximately 26 Myr and a galactic cause of extinction (Raup & Sepkoski 1984). More recently, Sepkoski's fossil data set has been reanalysed to reveal 62 and 140 Myr periodicity (Rohde & Muller 2005). Therefore, this fossil data set, spanning the entire duration of the Phanerozoic, illustrates both deterministic (periodic) and stochastic (random-walk) processes.

Time

Fig. 3.3 (a) Frequency distribution of 1000 random numbers drawn from a normal distribution with mean of 0 and standard deviation of 1. The curve shows the normal density function drawn from the mean and standard deviation of the observed data (mean of -0.0002 and standard deviation of 0.991). (b) Random-walk time series with X = 1 and equation 3.2. (c) Time series of random numbers used in (b).

Time

Fig. 3.3 (a) Frequency distribution of 1000 random numbers drawn from a normal distribution with mean of 0 and standard deviation of 1. The curve shows the normal density function drawn from the mean and standard deviation of the observed data (mean of -0.0002 and standard deviation of 0.991). (b) Random-walk time series with X = 1 and equation 3.2. (c) Time series of random numbers used in (b).

Random value at time t

Fig. 3.4 (a) Autocorrelation between numbers at time t + 1 and t in the time series of Fig. 3.3b (equation 3.1 model) and (b) the random numbers used in that model (Fig. 3.3c).

3.2.2 Birth/death processes in evolution

We have a number of options in constructing models of diversification. We can use discrete or continuous time models with deterministic processes as in Chapter 2. Alternatively we can employ stochastic models in discrete or continuous time (Nee 2006). The random-walk model above is an example of a discrete time process. Running this simulation many times would produce a set of possible values of clade richness which could serve as a null model for diversification. A key difference to the deterministic model is that there is no single outcome; instead the set of possible outcomes is defined by a particular distribution. These outcomes include complete extinction of the clade. Understanding and quantifying that distribution of possibilities is an important goal in evolutionary study.

The simplest and earliest model of diversification was the pure birth model in which each species has a constant probability of producing a new species and there is no extinction. This leads to the prediction of exponential growth, as with the deterministic analogue. The pure birth process is also known as the Yule process (Yule 1924). Yule's paper is a rich source of information on the ways in which evolution can be modelled. He shows how a mathematical model of evolution can be constructed from first principles and the various predictions that can be made. This includes examples of the probability distributions that arise from iterations of all possible outcomes of the model. In Fig. 3.5 the first two steps of the simplest model are shown. p is the probability of a species producing a new species (Yule referred to these as mutations) and q is the probability of that event not happening (p + q = 1). Yule was interested in the distribution of species within genera. Row 1 in Fig. 3.5 shows the probability of a genus containing one species after two time intervals. The time intervals were considered to be sufficiently small that two p events were highly unlikely. Rows 2 and 3 are then two outcomes that result in two species per genus. Notice that in all cases the values of p and q are multiplied together as they are independent events. After two time steps the probabilities associated with one, two, three or four species per genus are as follows: q2, pq + pq2, 2p2q and p3 respectively.

These terms sum to 1 as required (you can check this by substituting 1 - p for q). Yule extended the process to a large number of time steps, demonstrating that the terms formed a geometric series. The process was also developed for different-aged clades and tested against different data sets.

A natural development of the model is to include an extinction term. The characteristics of the stochastic birth/death process are as follows (Magallon & Sanderson 2001). Speciation and extinction are assumed to occur at constant rates, b and d respectively, which produces exponentially declining or increasing diversity. The diversification rate is defined as b - d whereas the relative extinction rate is d/b. The probability distribution for the number of lineages at a given time, t, is also known, as are the confidence limits for diversification rates. The fact that clades may go extinct before they are sampled is just one of several problems that face the interpretation of birth and birth/death models, especially when using phylogenies based on extant data. The example from Nee (2006) in Chapter 2 (Fig. 2.10) illustrates how, under a stochastic birth/death model, the cumulative increase in the logarithm of lineage numbers is expected to approach b with increasing time (towards the present). Rates of diversification are generally presented under different extinction scenarios, for example 0 and 0.9 (Magallon & Sanderson 2001).