Violation of independence is the most serious problem as it invalidates important tests such as the F-test and the t-test. A key question is then how do we identify a lack of independence and how do deal with it. You have violation of independence if the Y value at X, is influenced by other X, (Quinn and Keough, 2002). In fact, there are two ways that this can happen: either an improper model or dependence structure due to the nature of the data itself. Suppose you fit a straight line on a data set that shows a clear non-linear pattern between Y and X in a scatterplot. If you plot the residuals versus X, you will see a clear pattern in the residuals: the residuals of samples with similar X values are all positive or negative. So, an improper model formulation may cause violation of independence. The solution requires a model improvement, or a transformation to 'linearise the relationship'. Other causes for violation of independence are due to the nature of the data itself. What you eat now depends on what you were eating 1 minute ago. If it rains at 100 m in the air, it will also rain at 200 m in the air. If we have large numbers of birds at time t, then it is likely that there were also large numbers of birds at time t - 1. The same holds for spatial locations close to each other and sampling pelagic bioluminescence along a depth gradient. This type of violation of independence can be taken care of by incorporating a temporal or spatial dependence structure between the observations (or residuals) in the model.
The case studies later in the book contain various examples of both scenarios, but for now we look at a series of examples where some of these important assumptions have been violated.
Was this article helpful?