Multiple regression

In the previous section we had two possible independent variables which were correlated. This meant that either could be used and, indeed, it was only appropriate to use one. However, it is often the case that there are two or more independent variables that are not correlated and we wish to understand their overall contribution to the regression. This is the domain of multiple regression. With two independent variables, instead of fitting a line through a set of points as in linear regression, we are now fitting a two-dimensional plane through a set of points. With three independent variables it becomes difficult to visualize the process, but the mathematical principle of reducing the unexplained variation still applies.

We will now consider the application of multiple regression to an ecosystem analysis. The hypothesis was that methane production from wetlands was being suppressed by acid rain. As methane is a powerful greenhouse gas the reduction may have important consequences for global climate change. An experiment was undertaken involving treatments simulating acid-rain deposition on a peat bog with the levels of methane flux recorded (Gauci et al. 2002). A further analysis considered the relationship of the extent of methane flux with peat temperature and water-table depth; that is, two naturally occurring variables. This required use of multiple regression and is considered here.

The dependent variable is methane flux and the independent variables are peat temperature and peat water table. The data can be plotted as two separate graphs (Fig. 1.7) or combined in a three-dimensional graph in which the two independent axes are at right angles to each other (these may look impressive but it is often very difficult to read the values and so they are not included here).

The plots of the dependent against the independent values separately show that the percentage of methane flux increases with temperature and with water-table depth. Notice that the r2 values of these two are quite low (30 and 24% of variance explained respectively) although the regressions are significant (P = 0.0045 and 0.012). There is also the suggestion that there is a problem with the distribution of the residuals (Fig. 1.8). There are ways of dealing with these issues without further sampling, such as transformations of data, which were undertaken by the authors of the paper: we will consider this later. Here we will focus on the raw data and ignore the odd patterns of the residuals.

0

-10

d1-

-20

X

lu

e

-30

C

a

th

2

-40

-50

Water-table depth (m)

Fig. 1.7 Percentage of methane flux plotted against the two independent variables (peat temperature and water-table depth).

First, we check that there is no relationship between the (assumed) independent variables. It is true in this case although we note that there is a curious shape to the data with little variation in the water table at high and low temperatures but large variation at intermediate values (Fig. 1.9). Whereas it is possible to analyse the relationship between each independent variable and the dependent variable separately, we suspect that it is a combination of factors that contribute to the methane flux and we wish to capture that information in one mathematical statement. Rather than y = mx + c we now have two independent variables so the overall equation will look like this:

Methane flux = (a x temperature) + (b x water-table depth) + c Or in general terms: y = ax1 + bx 2 + c

Temperature (°C)

20 15 10

"U

25 20 15 10

"U

Sg 0

Fig. 1.8 Residuals from the linear regressions in Fig. 1.7.

Temperature (°C)

•• *

2

-10

-8

-6

0

• . •

• .

Fig. 1.9 Relationship between temperature and water-table depth. The correlation is not significant (r = 0.02, P > 0.9).

Like the simple regression counterparts these equations allow prediction of the dependent variable with given values of independent variables. Equation 1.3 indicates that for given values of temperature and water-table depth a value for methane flux can be predicted. The estimates of a, b and c are generated from the multiple regression analysis. Methane flux can then be modelled under a range of scenarios of temperature and water-table depth.

The multiple regression significance results are given in Table 1.1a with the parameter estimates (regression coefficients) detailed in Table 1.1b. The estimates are 1.99 and 2.65 for temperature (a in equation 1.3) and water-table depth (b in equation 1.3), respectively, in agreement with the predictions from the simple regressions. Notice that all the estimates are highly significant; that is, they are all highly significantly different from 0. This is not always the case in multiple regression. If variables are not significant then they are dropped from the analysis until we are left with the model (regression) that only contains significant components. The r2 is also improved in this model (0.556) over the linear regression. This is expected as we have combined two significant elements which are independent. Thus 55.6% of variation in methane flux is predicted by variation in temperature and water table. The full predictive equation is therefore:

Methane flux = -44.8 + (1.99 x temperature) + (2.65 x water-table depth)

In summary, multiple regression can be used as a tool for reducing complex models to their statistically significant components and for exploring the interplay between different explanatory variables. In this example we considered two linear relationships. The next section will address nonlinear

Table 1.1 (a) Regression model statistics and (b) parameter estimates and significance in methane flux regression model (n = 25).

Table 1.1 (a) Regression model statistics and (b) parameter estimates and significance in methane flux regression model (n = 25).

Sum of

Degrees of

Mean squares

F (regression

Probability

squares

freedom

(MS; calculated

MS/residual

(P)

(SS)

(df)

as SS/df)

MS)

Regression

1656.311

2

828.1554

13.799

0.000131

Residual

1320.312

22

60.0142

Total

2976.623

(b)

Regression coefficient SE

t-test value

P

Intercept

-44.802

5.846

-7.664

<0.00001

Temperature

1.993

0.505

3.946

0.00069

Water-table depth

2.653

0.746

3.556

0.0018

regression methods and from there move into an important class of mathematical models which has been used to describe a wide range of scientific phenomena.

Was this article helpful?

0 0

Post a comment