Estimators of N appropriate for Mh were developed first by Burnham (1972), who considered a beta-binomial mixture model and a 'non-parametric jackknife' estimator (Burnham and Overton, 1978, 1979). The jackknife estimator is in widespread use, and probably is the de facto standard because it has been implemented in popular software packages, such as MARK (White and Burnham, 1999) and COMDYN (Hines et al., 1999). As such, there is a casual equivalence made between 'Model Mh' and the application of the jackknife estimator to data. In fact, the so-called 'Model Mh' is a very broad class of models for describing variation in detection probability among individuals. At this time, a large number of different parametric models have been applied to the problem of modeling heterogeneity. For example, Coull and Agresti (1999) developed a model in which the logit-transformed p's have a normal distribution. Pledger (2000) considered a model in which variation in p is described by a finite-mixture of point masses. Dorazio and Royle (2003) considered a beta-binomial mixture and compared various classes using case studies and simulations. We will describe several of these models subsequently.

Technically, model Mh is only slight extension of the basic null model M0, to include an individual random effect. The key idea is the existence of variation in p among individuals, and so models are constructed by modeling pi as a random effect or latent variable. For a population of size N subjected to J samples, then, supposing that N were known, the heterogeneity model is described by

and pi ~ g(p\^), where g is some probability density function for p. The various flavors of model Mh correspond to the particular choice of g. Some of the more prominent versions are described subsequently.

Using standard likelihood methods of inference, we can attack this problem by integrated likelihood. In this case, as with basically every closed population model, the observation model that arises under any choice of g is a structured multinomial. In particular, in a study consisting of J survey periods, then the observed values of yi are the integers 1, 2,..., J. Conditioning on N introduces n0 as the unobserved detection frequency and the resulting pmf for the vector n = (ni, n2,..., nJ) of observed detection frequencies (where nk = ^"=i1(yi = k)) is the multinomial:

[n|0, N] - (n + «o)! -no~ni ~nj no^nk! 0 1 ••• J •

This is basically where we started with Mo as well. The difference is that the cell probabilities are not simple binomial cell probabilities, but rather they are the marginal or average probabilities, where the averaging takes place over the prescribed random effects distribution g, as follows:

The technical framework here is precisely equivalent to that of the abundance-induced heterogeneity models described in Chapter 4, where we considered integrated likelihood in several contexts. In the present context, p is a random effect, and in the integrated likelihood, the random effect is removed from the conditional likelihood (the likelihood that is conditional on the random effect) by integration.

These cell probabilities, nk, are analytic for a few cases, including the beta and so-called finite-mixture, both of which will be described subsequently. For other cases, e.g., the logit-normal, we have to do this integration numerically, which is not difficult, as we will demonstrate below.

Burnham's jackknife estimator is still the most widely used estimator for Model Mh. This procedure is viewed as being non-parametric because no particular g(p) is explicitly prescribed. However, the construction of the estimator implies a particular relationship among moments and, unfortunately, there may not be any g that possesses this relationship, or at least the existence of such a class is not obvious (Link, 2006). Thus, its interpretation as a non-parametric procedure may be questionable.

Alternatively, it is possible to construct a number of model-based estimators by prescribing a g(p) and providing standard methods of parametric inference. This model-based framework described in the preceding section provides a precise mathematical rendering of the individual heterogeneity model. A number of specific classes of g have been considered in some detail in the literature, and we review some of these now.

After Burnham's jackknife estimator, it was quite a few years before anyone devised an alternative mousetrap. Norris and Pollock (1996) devised what they referred to as a 'non-parametric MLE' of N in the presence of heterogeneity. What they proposed is commonly referred to as a finite-mixture or latent-class model for p. Under this model, each individual pi may belong to one of C classes, but the class membership is unknown. That is, the potential values of pj, the support points, are pi & {pc, c = 1, 2,..., C}. They have mass gc = Pr(p = pc) where J2^Li 9c = 1. Pledger (2000) gives a general treatment of these models.

For example, suppose the existence of two latent classes. In this case, the marginal probability of encountering a bird k times is, by the law of total probability, nk = Pr(y = k|pi,p2,gi) = Bin(k| J,pi)gi + Bin(k| J,p2)(1 - gi).

This is the discrete analog of the marginalization operation that we mentioned in the previous section. There is only a minor conceptual tweak here as we went from a continuous random variable to a discrete random variable. As in similar applications that we have encountered previously, the pmf of the observations is a structured multinomial with cell probabilities nk. This model is easy to analyze because of the simple form of the cell probabilities. In particular, all cell probabilities can be computed in one R instruction:

cellprobs<- dbinom(0:J,J,p1)*g1 + dbinom(0:J,J,p2)*(1-g1)

The likelihood can be completely described and maximized in only 1 or 2 more instructions, given the basic capability to maximize a multinomial likelihood (see Chapter 5).

The finite-mixture models represent g(p) as a discrete pmf with arbitrary (but discrete) support. This is the sense in which the model is 'non-parametric'. However, all of the support points and their masses are estimated, and the number of support points is unknown. Thus, the model is highly parameterized and so it is unclear whether there are advantages that arise from being 'non-parametric' in this case.

Pledger (2000) generalized the basic framework and formalized likelihood inference across broad classes of models under an encounter history formulation of the model. For example, a model with time effects and individual heterogeneity can be described by distinct encounter histories, with detection probability specified by logit(pjj) = aj + pj.

In this case, pj are fixed effects, whereas ai are assumed to vary according to a latent class model.

Note the relationship between this and the abundance-induced heterogeneity models described in Chapter 4. The latter have a large number of classes (theoretically an infinite number of classes), but the support points and their masses are constrained by the assumption of an abundance distribution.

A continuous mixture that is in widespread use for mixed models in many contexts throughout applied statistics is the logit-normal mixture, in which n = logit(pi) ~ N(y, a2). Coull and Agresti (1999) adapted this model for N estimation problems. As before, classical analysis can be achieved by integrated likelihood in which the random effect is removed from the conditional likelihood (the likelihood that is conditional on the random effect) by integration. In the present case, this requires that we calculate the following integral:

where g(nlM, a) is a normal density. This would have to be carried out numerically, but this is not difficult using R (and most software), as we demonstrate in the subsequent example.

Burnham (1972) initially considered models in which p had a beta distribution, denoted by pi ~ Be(a, b). This is an interesting model because the beta prior is conjugate for the binomial parameter p, and hence it yields some analytic tractability (Dorazio and Royle, 2003) under a model where the only source of variation in detection probability is that due to individual heterogeneity. In this case, the detection frequencies are multinomial, where the cell probabilities are structured according to a beta-binomial pmf. That is, if y\p ~ Bin(J, p) with p ~ Be(a, b) then, marginally, y is Beta-Binomial with probabilities and thus, conditional on N, the observed frequencies nk have a multinomial distribution with cell probabilities (no, ni,..., nj}.

The multinomial log-likelihood derived from the beta model for p can be described and maximized in only a few lines of R code. For this, we make use of the lgamma function in R for computing log(r(arg)).

r(J +1) r(a + y)r(J + b - y) r(a + b) r(y + 1)r(J - y + 1) r(a + b + j ) r(a)r(b)

6.2.0.4 So many Models Mh, so little data

An interesting aspect of the N-estimation problem in the presence of heterogeneity is that conventional goodness-of-fit statistics, such as deviance, cannot be relied on for selecting a model. For example, Coull and Agresti (1999) showed that, if capture probabilities are relatively low or vary greatly among individuals, the logistic-normal and latent-class models both may fit the observed data reasonably well, but provide substantially different inferences about N. This basic result was demonstrated also by Dorazio and Royle (2003) among several classes of models. The reason for this behavior is that model-based estimates of N actually correspond to extrapolations for the capture histories of unobserved individuals (Fienberg, 1972), so it is not surprising that the extrapolated values are sensitive to model structure. These results suggest that prior knowledge of potential sources of heterogeneity in individual rates of capture and data-based criteria must both be considered when selecting models for estimating N. Our view, which we stated in Dorazio and Royle (2003, 2005b), is that continuous mixtures should normally be favored on biological grounds because the causes of heterogeneity are many and varied (that is, unless there is some a priori belief in latent classes (which is not implausible)). Secondly, continuous mixtures have the advantage of parsimony. That is, they cost relatively less (in terms of parameters) to model heterogeneity than finite-mixture models.

The problem of selecting among different classes of heterogeneity models was further elaborated by Link (2003). He demonstrated that, in some cases, different choices of g can fit the data identically (or very nearly so), but imply wildly different values of N. Essentially, when you broaden the class of models to include all possible distributions for p, then N is not identifiable across classes. The effect of misspecification of heterogeneity models is related to the degree of heterogeneity and the mean detection probability. As Link noted, differences among mixtures are more pronounced as the mass of g(p) is concentrated near zero. One might view low mean detection or high levels of heterogeneity as suggesting that the species of interest cannot be reliably, or effectively, sampled. This should be viewed as a biological sampling issue to be considered in survey design prior to data collection, not a statistical issue to rectify after the fact by considering complex models of the detection process or by engaging in conventional model selection strategies aimed at choosing among different mixture distributions (Dorazio and Royle, 2005b; Royle, 2006). Consistent with the broader philosophical framework within which we operate (i.e., parametric inference), we should not be overly apprehensive of picking a model and carrying out inference under that model, provided that the model is useful for its intended purpose.

That N is non-identifiable in this general sense, across all possible classes of models, does not diminish its central importance in ecology and conservation biology where estimation of N or density is important in certain problems. We know heterogeneity must arise in certain situations, due to biological mechanisms that we can describe. Thus, there will be cases where some effort should be made to account for it explicitly by a model. In this regard, there are active attempts to get around this problem by broadening the class of models. For example, Morgan and Ridout (2008) considered an extended class of models by mixing distributions of continuous and discrete support. Dorazio et al. (2008) developed a class of models based on the Dirichlet process (DP) prior, which is designed to account for a modeler's uncertainty in the latent distribution of detectability. Estimates of N based on the DP prior therefore should be robust to errors in model specification. Mao (2004) developed a purely non-parametric procedure for estimating a lower bound for N that generally exceeds the trivial bound of N = n. However, the applicability of estimates of lower bounds in ecological problems seems unclear.

6.2.1 Example: Flat-tailed Horned Lizard Data

We return to the flat-tailed horned lizard data introduced in Section 5.3.2. Recall that these data originate from samples carried out on a 9 ha plot of dimension 300 m x 300 m, with 14 capture occasions over 17 days. A total of 68 individuals were captured 134 times. The distribution of capture frequencies was (34, 16, 10, 4, 2, 2) for 1 to 6 captures respectively, and no individual was captured more than 6 times. We fitted the basic null model ('Mo') in which p is constant, yielding N = 82.02 (SE = 5.414) p = 0.117 (SE = 0.0122) and the AIC for this model was -242.661.

We noted two general problems that arise in spatial capture-recapture surveys. First, that the definition of N is rendered somewhat ambiguous by movement of individuals. That is, while it may be that N represents the size of some actual population of individuals that were exposed to sampling (a 'super-population', Kendall et al., 1997; Kendall, 1999), we do not know the area over which those individuals were drawn. Secondly, the proximity of an individual's home ranges relative to the area being surveyed induces heterogeneity in detection probability. This provides the heuristic justification for consideration of heterogeneity models for such data. It still doesn't provide a firm basis for interpretation, but it does suggest the mechanism that will yield heterogeneous detection probabilities.

For this purpose, we fitted the logit-normal version of the heterogeneity model (Coull and Agresti, 1999). The R code is shown in Panel 6.1, along with instructions to execute it for these data. The function Mhlik computes the integrated binomial cell probabilities, making use of the R functions for computing the binomial and normal densities, as well as the function integrate. The estimates obtained under this model were N = 106.8 (SE = 20.8), / = -2.56, and a = 0.795. The corresponding AIC was -247.3902. This model is somewhat favored by AIC, relative to the null model. The interested reader should be able to modify this R code to fit the beta-binomial or finite-mixture model.

We see that the heterogeneity model yields quite a substantial increase in the estimated abundance, from about 82 under M0 to about 107 under the logit-normal mixture version of Mh. One view of this difference is that we believe the latter estimate to be more relevant because heterogeneity must be induced by the spatial nature of the system - the variable exposure of individuals to detection by the trapping apparatus. We do pay a price for entertaining this more complex model - that being the increased uncertainty in N that results from fitting the heterogeneity model.

We return now to the tiger camera trapping data introduced in Section 5.3.3. Recall that 45 individuals were captured and, under the null model, we have N = 70.4 and p = 0.0805. This model has an AIC score of -160.095.

We fitted the logit-normal flavor of Model Mh as implemented in Panel 6.1, which yields N = 111.69, and / = -3.27 and a = 0.89. The AIC for this model is -159.650. The estimate of N is somewhat imprecise, as indicated by the very flat profile likelihood (Figure 6.1). The profile likelihood can be constructed by recursive calls to a modified version of the function in Panel 6.1, in which N is fixed. As N is varied, and the likelihood maximized as a function of the remaining two parameters, the resulting likelihood minimum as a function of N is the profile likelihood. The profile likelihood has end points on either side of the MLE of n0 that yield an increase of 1.92 negative log-likelihood units. For the tiger data, the 95 percent profile likelihood lower bound is in the vicinity of 60, but the profile likelihood does not achieve a difference of 1.92 to the right of the MLE within the range considered in Figure 6.1.

We note that the heterogeneity model is only slightly better than the constant-p model, judging by the AIC scores. The very low detection probability for tigers, and the resulting sparse data, makes it difficult to detect heterogeneity due to movement. The effect of movement is to shrink individual p's toward the origin by a variable degree depending on an individual's relative exposure to traps. We might improve on our estimate of N by considering the movement mechanism explicitly. We'll revisit these data in the next chapter where we will consider models that are based on explicit exposure/movement mechanisms.

Was this article helpful?

## Post a comment