# Parameter Estimation

The maximum likelihood (ML) estimates of the parameters in 6 and £ are those values that jointly maximize the likelihood function L {6, £). This is equivalent to maximizing the log-likelihood function In L{6, £) appearing in (2.17). The ML estimates of the parameters have several desirable statistical properties. As the sample size increases, they are asymptotically efficient (i.e., the variances approach the theoretically lowest bound), unbiased (i.e., the bias approaches zero), and normally distributed (which allows for the construction of approximate confidence intervals) . For nonlinear time series models, the sample size is the number of observations in the time series. Theorems about these and other ML properties generally require the stochastic model to have a stationary distribution . A nonlinear autoregressive model of the form (2.13) typically has a stationary distribution when every trajectory of the underlying deterministic model Wf+1 = h(Wt) has a bounded attractor, which we saw in Section 2.3 is the case for the deterministic LPA model (2.4).

No general formulas for ML parameter estimates exist.6 Neither do such formulas exist for the particular case of the LPA model. Therefore, maximization of the log-likelihood function In L{0, S) must be done numerically. For example, we have found the Nelder-Mead simplex algorithm [138,148] convenient, reliable, and easy to program for this purpose.

There is, of course, uncertainty associated with ML parameter estimates. However, with likelihood methods it is straightforward to compute confidence intervals of the individual parameters and joint confidence regions for sets of parameters (we use 95% confidence intervals and regions). Among the methods we use for constructing confidence intervals and regions is the method based on the "profile likelihood." Profile likelihood intervals require a great deal of computation, but can be applied to many different types of statistical models [127, 189], They are only approximate in that their coverage frequencies only asymptotically converge to 95% as the sample size (time series length) increases without bound. The intervals are usually asymmetric and typically have better small-sample coverage frequencies than do symmetric confidence intervals arisingfrom the matrixof second derivatives ofthe log-likelihood function.

Profile likelihood intervals and regions are calculated as follows. Suppose /3 is a parameter, or a vector of r] > 1 parameters, of interest in the

6 However, the ML estimates of the parameters in the matrix £ can be written in terms of the ML estimates of the parameters in 6. Specifically, £ = RRr/<7 where R = [ei, e2,..., e,] is a matrix with the residual vectors (2.21) as columns and RT is its transpose.

 Parameter ML estimate 95% confidence interval b