## Nonlinear regression and power laws

To illustrate nonlinear regression let us return to the familiar example of the Fibonacci sequence. Imagine that the Fibonacci sequence is used to model change in population size over time. According to this model the population size at any point in time can be predicted given the starting values and the simple mathematical operation of adding the two previous values. In Fig. 1.1 we saw the increase in size over 19 time points. The striking result is that the values appear to increase slowly at first but then increasingly rapidly. This is typical of geometric or exponential increase in which the values at one time point are some multiple of the values at the previous time point. Geometric sequences are in contrast to arithmetic sequences where a term in the sequence is produced by adding a constant value to the previous term. In the Fibonacci sequence, the values at the current time are the sum of the values at the previous two time points, which simulates geometric growth. Geometric growth or geometric sequences in general can be summarized using the mathematical notation of powers. For example, the geometric sequence 1, 2, 4, 8, 16, 32 . . . is produced by starting with 1 and multiplying it by 2; this multiplication is then repeated. If the terms of the sequence are themselves numbered 0, 1, 2, 3, 4 . . . then the geometric sequence is seen as 20, 21, 22, etc. Furthermore, if the sequence term is represented by the letter t, then the sequence value for term t is simply 2'. Population and evolution models of this form will be considered in Chapter 2.

In Fig. 1.1 a curve is fitted through the points with the following equation:

where t is time and Nt is the population size at time t. This is an example of a nonlinear regression in which a curve is fitted through the points. The best fit is determined in a similar manner to a linear fit, for example by minimizing the squared differences between the observed and predicted values which we encountered with linear regression, a technique known as least squares. Comparison of equation 1.5 with 2t suggests that it is a mathematical summary of a geometric sequence. Equation 1.5 is slightly more complex in that it uses the number e instead of 2, t is multiplied by the value 0.4778 and e0 4778t is multiplied by 0.4676.

The number e is a special (natural) number with particular properties and an illustrious history involving Napier, Mercator, Bernoulli, Leibniz and finally Euler, who gave a full description of the value of e to 18 decimal places Fig. 1.10 Examples of exponential functions. Notice the effect of a negative power.

Fig. 1.10 Examples of exponential functions. Notice the effect of a negative power.

in 1748 (www-history.mcs.st-andrews.ac.uk/HistTopics/e.html). For example, if we plot the curve y = e we discover that the gradient of the curve at any point x is equal to y (Fig. 1.10). Mathematically this is stated as dy/dx = ex (this type of equation is explained in detail in later chapters). e is an irrational number (like n) with a value of 2.718. . . .

Instead of fitting a curve through the points we can employ a mathematical transformation to convert an exponential curve to a straight line and then use linear regression. The appropriate transformation for exponential functions involves logarithms (abbreviated to logs). If y = 10x then taking the logarithm to the base 10 (written as log10) of both sides gives:

Similarly, if y = 10a then log10 y = ax. If the exponential function is of the form y = b10ax then taking logs gives log y = log b + ax. Therefore we can see that taking logs transforms a nonlinear function to a linear function. Plotting log y against x will give a straight line with log b as the intercept and a gradient of a.

Logarithms can be given to any base number. The base number e provides an important type of logarithm called natural or Naperian logarithms. Instead of writing loge we use the abbreviation ln. So if we take the natural log (ln) of both sides of equation 1.5 we can produce a linear equation (i.e. one with an equation y = mx +c; Fig. 1.11):

This is a useful result as it allows us to use linear regression to estimate the parameters of the exponential equation. The straight line equation is y = -0.7601 + 0.4778x. We can see that 0.4778 agrees with the exponent or power value in equation 1.5; ln(0.4676) is also equal to -0.7601. The fit of the regression is very high, the r2 has a value of 0.999 indicating that 0.999

 9 8 7 <0 6 > o 5 o <0 c o 4 .a 3 c 2 1 0

Fig. 1.11 Logarithmic transformation of a Fibonacci population model to produce a linear relationship.

or 99.9% of the variation in the dependent variable is explained by variation in the independent variable. This fit can be improved further by seeing that the first two values are high residuals (1 and 1 were fixed start points). Removing those two values and fitting through the remainder gives y = -0.7938 + 0.4804x with an r2 of 1. Actually, it is not quite a perfect fit (r2 = 0.9999)!

Let us recap what we have done here. We have made a model of population dynamics with the Fibonacci series. This model can be described mathematically by either a nonlinear function (y = aebx) or an equivalent linear equation (ln y = ln(a) + bx). This means that we could predict the numbers of rabbits at any time. In doing this we have introduced several important methods: linear and nonlinear regression, exponential or geometric growth and logarithmic transformation. All of these methods will be used and explained in further detail later in the book.

Just as e was identified as a natural number it is clear that many biological and other scientific phenomena are naturally nonlinear. Let us consider a classic ecological example of a nonlinear relationship and then step back to consider why such relationships might arise. It has long been known that (other things being equal) habitats with larger areas contain more species. These areas may have natural boundaries, such as islands within an ocean, or be different sample areas within a larger area of suitable habitat. Whereas the increase of species number with habitat area is perhaps intuitive, the precise form of the relationship and interpretations of the mechanisms are not so straightforward (Fig. 1.12).

In general it has been shown that species/area relationships can be modelled as a power function: 10-1 10 101 102 103 104 105 106 10' 108 Area (km2)

450 400 350

150 100 50 0

150 Area

Fig. 1.12 Mathematical relationship of species on islands with island area. The data (a) are shown on a log-transformed linear plot. The shape of the curve without the logarithmic transformation is shown in (b). Note that the graph in (b) does not cover the full range of values in graph (a). Graph from Lonsdale (1999) reprinted in Williamson et al. (2001).

where A is area (independent) and S is the number of species (dependent). Note the difference between power functions as exemplified by equation 1.7 and the exponential function such as equation 1.5. In the case of the exponential function the independent variable (x) is the power, e.g. y = a10bx and the linear transformation is achieved by plotting log y against x. In the case of the power function the independent variable is the base (e.g. y = axb) with the corresponding linear transformation of log y against log x. So in the case of S = cAz taking the logarithms of both sides yields:

With this transformation we could regress log S against log A and determine z and log c from the gradient and intercept respectively. In Fig. 1.12a, the axes are log-transformed (the original untransformed values are given on logarithmic axes; equally one could show the log values, e.g. -1 to 8 on the area axis) showing a linear relationship between numbers of native species and island area. The fitted line is:

Thus z = 0.27 and log c = 1.96 (therefore c = 10196 = 91.2).

Species/area relationships fitted to power functions have been shown to occur across a variety of plant and animal groups leading to the suggestion that this is a natural ecological 'law'. Describing it as a law as opposed to an empirical generalization depends on whether there is a consistent underlying mechanism. The semantics of these debates are beyond the scope of this book but it is certainly the case that power functions of the general form y = cxb not only describe species/area relationships but also arise frequently across a range of scientific phenomena. These include various relationships with body mass including respiration (metabolic) rate, population density and generation time; variance/mean relationships applied to populations and numbers of cells in bodies and evolutionary processes such as species interactions through time.

Why should power functions be so prevalent? One answer is that power functions can occur when two variables are linked by a third common variable. To illustrate this answer we will consider a simple physical example.

Imagine a set of spheres of different sizes. The size of the sphere is given by the radius (r). We can calculate the surface area and the volume of each sphere (surface area = 4nr2 and volume = (4/3)nr3). If we then plot the surface area against the volume for each sphere we obtain a power function (Fig. 1.13a), which when log-transformed gives a linear function (Fig. 1.13b). The equation of the power function is:

Surface area = 4.836 x volume23

which when log-transformed is:

The volume and surface area of a sphere are related through a common third variable (radius). We can see this from the ratio of surface area to volume:

Surface area of sphere/'volume of sphere = 4nr2/(4/3)nr3 = 3/r

As surface area is proportional to r2 and volume is proportional to r3, increasing the radius r will produce a power function which increases by r2/r3; that is, 2/3. So the value of the power is 2/3. If volume is plotted against surface area the power value would be 3/2. This reasoning can be applied to biological phenomena. Imagine that we are plotting the rate of heat loss of different

5000 4500 5000 10000 15000 20000 Volume of sphere

25 000 30 000

5000 10000 15000 20000 Volume of sphere

25 000 30 000

2 2.5 3 log volume

Fig. 1.13 Relationship of surface area and volume of spheres of different radius. (a) Power function and (b) corresponding log-transformed plots.

organisms against their mass. We might expect that heat loss will be proportional to the surface area of the organism. As mass will be proportional to volume, a power function with an exponent of about 2/3 would be expected. As organisms are generally not perfect spheres and have various physiological contraptions that, for example, reduce heat loss, deviations from 2/3 may occur. Similarly there will be errors in measurement which will contribute to variation around the regression line. However, you can see that power laws may be expected simply as a consequence of scaling of body size. These phenomena have been widely studied and are referred to as allometric processes.

The principle of a common third variable generating power functions through scaling can be applied to other ecological and evolutionary processes. Take the -3/2 thinning rule (or law) as an example. The -3/2 power law has been much discussed among plant ecologists as it was considered as one of the Log10d (shoots m-2)

Fig. 1.14 Example of self-thinning rule with a marine alga (Phyllariopsis purpurascens). A significant frond mean mass (m) to density (d) relationship was obtained (log10 m = 0.6 - 1.4 log10 d) where the slope of -1.4 was not significantly different from the expected -1.5 (Flores-Moya et al. 1996).

few 'laws' in ecology. The phenomenon is observed when plants of a particular species are grown or occur naturally at different densities (Fig. 1.14). If the mean masses of plants are plotted against plant density then, as the plants grow, they reach and then follow a boundary which has a gradient of approximately -3/2. The following of the boundary (and therefore reduced density) indicates that plants are lost and therefore thinning.

An explanation of this rule is that mean plant mass is proportional to volume (l3) whereas density is inversely proportional to area (l2). Therefore density (N) = k/l2 and mean plant mass (m) = pl3 where k and p are constants. These two equations can be rearranged to make equations in terms of l:

This shows that:

A power of 1/2 means a square root of a number whilst a power of 1/3 means the cube root. If we cube both sides of equation 1.10 we obtain:

Combine p and k3/2 into a new constant, s:

So, mean plant mass (m) is related to density (N) to the power -3/2. In fact, the -3/2 rule has been subsumed within a more general -4/3 rule relating average mass to maximum density (Enquist et al. 1998).