## Factor analysis and structural equation models

'Factor analysis' adds the concept of 'latent variables' to regression models, supplementing the 'measured variables' that have been discussed up to this point. This is done to represent underlying model constructs that are not directly observable, or to model subgroups of measured variables that have substantial correlation. In factor analysis, each measured variable is conceived as being dependent on some set of latent variables, and the regression parameters describing such dependences are known as 'factor loadings'. The inclusion of appropriate error terms continues to be an important part of model specification.

While factor analysis permits the inclusion of latent variables as predictors, structural equation models (SEMs) permit both predictor and response variables to be latent variables. This allows the full set of anticipated causal relationships to be modeled simultaneously. For this reason, such models are also referred to as 'causal models'.

SEM is generally used as a confirmatory or synthetic, rather than exploratory, technique. This is because it usually starts with a hypothesized set of causal relations between latent and measured variables, drawn as a graphical 'structural model' (or 'path diagram') (Figure 3). This graphical model is then translated into a set of equations involving the observed variables that can be used for statistical analysis. This involves making some assumptions concerning the form of the functional relationships and the distributions of variables. In most SEMs, the relationships are assumed to be linear, and the variables are assumed to be multivariate normal.

The SEM is next tested through comparison with measurements. In this comparison, some of the relationships may be considered 'fixed', with parameters determined from previous studies, while others may be considered 'free', with parameters to be estimated from the current data. Structural confirmation and parameter estimation occurs by comparing the covariances between measured variables (assuming the specified relations) against the covariances of the best-fitting model. This is done using either weighted least-squares or maximum likelihood methods.

The final tested model can then be used for prediction of response variables, employing all available causal knowledge relating predictor, response, and latent variables. Of course, because the causal claims of an SEM are still only based on analysis of correlations among uncontrolled observable variables, caution should be taken until controlled experimentation has been performed. 