From the AICs and likelihood ratio test, we can conclude that we are violating the independence assumption in the linear regression model. So the remaining question is now whether adding any of these spatial correlation structures can solve the independence problem. The commands

> Vario1E <- Variogram(B1E, form =~ x + y, robust = TRUE, maxDist = 2000, resType = "pearson")

> plot(Vario1E, smooth = FALSE)

will show the experimental variogram with the fitted spatial correlation (results are not shown here), and the following code

> Vario2E <- Variogram(B1E, form =~ x + y, robust = TRUE, maxDist = 2000, resType = "normalized")

> plot(Vario2E, smooth = FALSE)

does the same for the normalised residuals. The later ones should no longer show a spatial correlation (you should see a horizontal band of points). Results are not presented here, but the experimental variogram of the normalised residuals indeed form a horizontal band of points, indicating spatial independence.

Note that we should apply the same 10-step protocol we used in Chapters 4 and 5. First determine the optimal random structure using REML estimation, using as many fixed covariates as possible. (However, here all covariates are highly collinear; so there is effectively only one variable.) Once the optimal random structure has been found, the optimal fixed structure can be found using the tools described in Chapters 4 and 5. So, the whole REML and ML process used earlier also applies here.

For this chapter, we used the GLS model. If a random effects model is used, the spatial correlation structure is applied within the deepest level of the data. See also Chapters 16 and 17 where we impose a correlation structure on nested data.

Now we return to the Hawaiian bird data set, which we left with an AR1 autocorrelation structure. In the previous section, we used the form =~ x + y argument in the correlation option. If included in the gls, lme, or gamm function, it ensures that R calculates distances between the sampling points with coordinates given by x and y. The default option to calculate distances is Euclidean distances (using Pythagoras) and alternatives are Manhattan and maximum distances (Pinheiro and Bates, 2000). In the Hawaiian data, we used form =~ Time | ID in the corAR1 function. Nothing stops us using for example a spatial correlation function like corSpher for time series. It can cope better with missing values and irregularly spaced data. In fact, the corExp structure is closely related to the corAR1 (Diggle et al., 2002). The following code applies the model with the corAR1 structure and all four spatial correlation functions. We copied and pasted the code from Chapter 6 to access the data.

> library(AED); data(Hawaii)

> Birds <- c(Hawaii$Stilt.Oahu, Hawaii$Stilt.Maui,

Hawaii$Coot.Oahu, Hawaii$Coot.Maui)

> ID <- factor(rep(c("Stilt.Oahu", "Stilt.Maui",

"Coot.Oahu", "Coot.Maui"), each = length(Hawaii$Year)))

> library(mgcv); library(nlme)

> #Define the fixed part of the model

s(Time, by = as.numeric(ID == "Stilt.Oahu"))+ s(Time, by = as.numeric(ID == "Stilt.Maui"))+ s(Time, by = as.numeric(ID == "Coot.Oahu"))+ s(Time, by = as.numeric(ID == "Coot.Maui")))

> HawA <- gamm(f1, method = "REML", correlation =

corAR1(form =~ Time | ID), weights = varIdent(form =~ 1 | ID))

> HawB <- gamm(f1, method = "REML", correlation =

corLin(form =~ Time | ID, nugget = TRUE), weights = varIdent(form =~ 1| ID))

> HawC <- gamm(f1, method = "REML", correlation =

corGaus(form =~ Time | ID, nugget = TRUE) weights = varIdent(form =~ 1| ID))

> HawD <- gamm(f1, method = "REML", correlation =

corExp(form =~ Time | ID, nugget = TRUE), weights = varIdent(form =~ 1 | ID))

> HawE <- gamm(f1, method = "REML", correlation =

corSpher(form =~ Time | ID, nugget = TRUE), weights = varIdent(form =~ 1| ID))

> #Compare the models

> AIC(HawA$lme, HawB$lme, HawC$lme, HawD$lme, HawE$lme)

df AIC

HawA$lme 18 2277.677 HawB$lme 19 2281.336 HawC$lme 19 2279.182 HawD$lme 19 2279.677 HawE$lme 19 2278.898

The results of the AIC command indicate that the model with the corAR1 structure should be chosen.

In this section, we analyse the nitrogen isotopic data of teeth growth layers of 11 whales. We start with one whale and then analyse the data from all whales.

In Chapter 2, we applied linear regression on the nitrogen isotope values of a whale nicknamed Moby, and we discussed two potential sources of violating the independence assumption. The first was a potential improper model specification (a linear relationship when the real relationship may be non-linear). The second one was due to the nature of the data; nitrogen concentrations at a certain age s may depend on the concentrations at age s - 1, s - 2, s - 3, etc. To deal with the first problem, we applied a Gaussian additive model on the data for Moby:

ys = a + f (ages) + es e ~ N(0, a2 x V), where e = (e1,e2..., eT)'

The index s represents year and runs from 3 to 44 for Moby. The variable ys contains the isotopic value in year s, a is the intercept, ages is the age in year s, f(ages) is the smoothing function of age, and es are the residuals. In an ordinary Gaussian additive model (or linear regression model), we assume that the residuals are independent and normally distributed with mean 0 and variance a2. This means that V is a 42-by-42 identity matrix. (This is matrix full of zeros, except for the diagonal; these are all equal to 1.)

To allow for a dependence structure between the observations, we can use any of the correlation structures discussed earlier in Chapter 6 or in this chapter. Instead of temporal or geographical coordinates, age is now the variable that we use to set up the variogram. As a consequence, V is no longer a diagonal matrix. Its off-diagonal elements give the residual covariance at different ages. The key question is now, how we should parameterise this matrix. Clearly, using a completely unspecified matrix results in too many unknown parameters. We can use the variogram or the AR1 residual correlation structures. These will specify that observations that are separated by an age of k years have a correlation as specified by, for example, the linear, spherical, exponential, or Gaussian variogram structure. All we have to do is to apply models with different covariance structures and assess which one is the most appropriate using, for example, the AIC.

The model selection process is identical to mixed modelling; (i) start with a model that contains as many explanatory variables as possible, (ii) find the optimal random structure, and (iii) find the optimal fixed structure. If we have data on only one whale, the first step is rather simple: use age. The following code imports the data, extracts the data from Moby, and applies the models.

> library(AED); data(TeethNitrogen)

> TN <- TeethNitrogen

> N.Moby <- TN$X15N[TN$Tooth == "Moby"]

> Age.Moby <- TN$Age[TN$Tooth == "Moby"]

> library(mgcv); library(nlme)

> #Apply gamm models

> Mob1 <- gamm(f, method = "REML", cor =

corSpher(form =~ Age.Moby, nugget = TRUE)

> Mob2 <- gamm(f, method = "REML", cor =

corLin(form =~

corGaus(form =~

corExp(form =~

corRatio(form =

- Age.Moby, nugget = TRUE)) "'REML", cor = Age.Moby, nugget = TRUE)) "'REML", cor =

Age.Moby, nugget = TRUE)

> Mob6 <- gamm(f, method = "REML", cor =

> AIC(Mob0$lme, Mob1$lme, Mob1$lme, Mob4$lme, Mob5$lme,

Mob6$lme)

Mob0$lme Mob1$lme Mob2$lme Mob4$lme Mob5$lme Mob6$lme

4 64.52995 6 67.02795 6 67.02795 6 63.38405 6 63.09320

The model with the corGaus correlation structure did not converge and is therefore not used in the AIC command. Except for the corSpher correlation structure, all AlCs are similar; hence, we might as well choose the simplest model, which is the one without a correlation structure (the linear regression model, Mob0). Comparing model Mob0 with Mob5 (Mob0 is the model without a correlation structure and Mob5 is the best potential model with respect to spatial correlation) using a likelihood ratio test gave a p-value of 0.06 (just type: anova(Mob0$lme, Mob5$lme)). Hence, there is no convincing evidence to use a correlation structure for the data of this whale. Furthermore, the estimated smoother in model Mob5 is a straight line. This indicates that for the Moby data, the linear regression model that was presented in Chapter 2 suffices. This is rather confusing as the model did have residual patterns!

Was this article helpful?

## Post a comment