## Linear regression least squares and total least squares measurement errors in both x and y

Before moving on, I want to briefly discuss linear regression and least squares (which most readers are probably familiar with) and total least squares (probably not). The set up is that we have a set of data consisting of a set of observations of a putative causative variable X and response variable 7; the data are thus pairs {X-,7-} for' = 1,..., n. The question is this: how do we characterize the relationship between Xand 7?

The first possible answer is that there is no relationship between the two. If we were to write a formal statistical model, a natural choice is

where the parameter a is to be determined and we assume that Z' is normally distributed with mean 0 and variance o\ We already know that the maximum likelihood estimate of a is the same as the value that minimizes the sum of squared deviations ^n= 1 (7' — a) and that this estimate is the same as the sample average. Let us denote that estimate by a. We can then compute the residual sum of squared errors according to i (Y< — a)2. This quantity can be thought of as the amount of variability in the data that is not explained by the model from Eq. (3.79).

Second, we might assume that there is a linear relationship between the causative and response variables, so that the formal statistical model is

where we now must determine the parameter b - and this is crucial - we assume that the causative variable Xi is measured with certainty. Proceeding as before, we can compute either the sum or squared deviations SSQ(a, b) or the log-likelihood L(a, b). They are

and we see that once again maximizing the likelihood is the same as minimizing the sum of squared deviations. To do that, we take the derivatives of SSQ(a, b) first with respect to a, then with respect to b, and set them equal to 0, in order to obtain equations for the maximum likelihood estimates for a and b.

## Post a comment