## Info

1 ai2 ai3 a 14

\00000000

1 ai2 ai3 a 14

ai2 1 a23 a24

ai3 a23 1 a34

The upper left 4 x 4 block is the correlation matrix for the first patient, the second block for the second patient, etc. In each block, ast is the correlation between observations s and t. We use the different blocks to estimate these parameters. No correlation between the parameters a12, a13, a14, a23, a24, and a342 is assumed, and they are estimated completely independently. Note that this is based on 4 observations per subject. The number of independent parameters to be estimated rapidly increases as the number of within-subject observations increases with all the attendant problems related to matrix inversion.

This is the most general correlation model and perhaps the least intuitively appealing. Essentially, all correlations between within-subject observations are estimated independently; thus a lot more parameters need to be estimated. Because of this complexity, the GEE algorithm can break down because the correlation matrix cannot be inverted. However, it can be a useful approach if no obvious correlation structure suggests itself and can be a useful exploratory step to help arrive at a final choice of correlation structure.

Let us discuss the applicability of the unstructured correlation for the response variables in the California bird data set, owl data set, and deer data set. For the moment, we ignore that these response variables are not continuous. For the California bird data, each block of correlation in the matrix above is for a field; for the owl data each block is a nest; and for the deer data, each block is a farm. For the deer data, it does not make sense to use the unstructured correlation because there is no relationship between animals 1 and 2 at farm 1, and animals 1 and 2 at farm 2. For the California bird data, it may be an option to use this correlation structure as observations 1 and 2 in field 1, and observations 1 and 2 in field 2 both tells us something about the temporal relationship at the start of the experiment. On the down side, 10 temporal observations per field mean that we have to estimate 10 x 9/2 = 45 correlation parameters, which is a lot! The unstructured correlation may be an option for these data if you have hundreds of fields, but not with only 12 fields. For the owl data, it is a bit more complicated. If we just analyse the number of calls sampled at the nests without a time order, then the set up of the data is similar to that of the deer data. Hence, in this case, we cannot use the unstructured correlation. But we also know the arrival time of the parents at the nest, which unfortunately, is irregularly spaced. However, in Chapter 6, we argued that based on biology, we could assume that owl parents chose the arrival time, and therefore, from their point of view, the data are regularly spaced. Hence, if we use the unstructured correlation, then a12 represents the correlation between arrivals 1 and 2, a13 the correlation between arrivals 1 and 3, etc. This would make sense, but unfortunately, this approach requires an enormous amount of correlation parameters as some nests contain more than 50 observations. Hence, it is not practical.

Option 2: AR-1 Correlation

Another option for continuous data is to say that the correlation between two observations from the same patient, field, nest, or farm i is cor(YiS, Yif) =

This type of auto-regressive correlation structure was also used in Chapter 6 (using the corAR1 function). Autoregressive correlation is observed when correlation between within-subject observations can be modelled directly as a function of the 'distance' between the observations in question. Using the same example as above, the following correlation matrix is used.