of survey) vary among quadrats, as they are selected by individual observers. In addition to these variables that might explain the likelihood of observing a y = 1, we have auxiliary covariates of biological relevance, elevation (Figure 3.1) and forest cover (Figure 3.2) which we expect to be important determinants of distribution and abundance for most breeding birds. In particular, the elevational gradient in Switzerland is severe, and this substantially affects climate/weather and vegetation.

To illustrate some basic concepts and motivate subsequent developments, we focus presently on modeling the effect of these landscape covariates on occurrence to derive a distribution map. However, we note that other covariates that describe deviations from a rigorously standardized protocol are also of interest to model the effects due to non-standardization. Such nuisance effects have been noted to be important in a number of large-scale wildlife surveys, including the North American Breeding Bird Survey (Sauer et al., 1994), Christmas Bird Count (Link and Sauer, 1999), the North American Amphibian Monitoring Program (NAAMP; Weir et al.,

Figure 3.1. Elevation, m above sea level. Due to the extreme elevational gradient and its influence on climate/weather and vegetation, we expect that it should be an important factor leading to variation in abundance and occurrence of most bird species in Switzerland.
Figure 3.2. Forest cover, percent of quadrat that is forested. Forest should be an important indicator of habitat suitability for many bird species.

2005), and Swiss butterfly (Kéry and Plattner, 2007) and bird (Kéry and Schmid, 2004, 2006) surveys.

Some consideration needs to be given to how the replicate observations should be integrated into the analysis. Presently, they provide more information than is necessary to carry out a simple logistic regression. We will begin with an analysis based only on the data from the first survey period, since the species was found to be more detectable early in the breeding season (Kery et al., 2005), and to carry out an ordinary logistic regression for a species of concern, we only require a single sample. In general, the manner in which one deals with such replicates depends largely on the formalization of a model for the underlying process of interest, and for the observation mechanism (i.e., detectability). We are intentionally avoiding these issues for the time being in order to focus on the basic logistic regression problem at hand.

We consider models containing forest cover and linear or quadratic elevation effects. Most species should respond to both of these factors, and the response to elevation is likely to be quadratic simply because of the severe environmental gradient that results from increasing elevation. Many species may have some intermediate 'optimum' elevation of occurrence that should be evident in the data. It is simple to carry out a logistic regression in all popular software packages, and R is no exception, having a fairly versatile routine (called glm) for fitting generalized linear models, of which logistic regression is a particular case. The glm function represents a reasonably user-friendly method for analysis, but one that obscures the technical details of how such models are analyzed. Thus, while the use of glm (or similar functionality) is sufficient for many purposes, we will do this the hard way to foster learning, and also because the basic construct becomes essential for some important generalizations to be described subsequently. As we have claimed previously, if you do the easy problems the hard way then, ultimately, the hard problems will be easy.

The basic strategy is to write an R function (or script) that evaluates the log-likelihood function, and then pass this to one of the available optimizers (e.g., nlm). The code to evaluate the log-likelihood and the call to nlm are given in Panel 3.1. The key elements of this function are the lines probs<-expit(b0 + b1*elev + b2*(elev"2) + b3*forest)

which computes the binomial probabilities according to Eq. (3.1.1), as a function of covariates elevation (a quadratic) and forest cover. The other main calculation is the contribution of each observation to the binomial log-likelihood. This is done for the vector of observations simultaneously by the instruction:


The log-likelihood function allows for the possibility of missing values by defaulting the log-likelihood to 0 wherever a missing value occurs. Thus, missing values contribute nothing to the objective function (the negative of the summed log-likelihood). The log-likelihood is minimized and the output saved to an object out by executing the following R instruction:

The second argument to this function call (the vector of 4 zeros) are the starting values. In some cases, especially for very complex models or sparse data, numerical optimization routines can be sensitive to starting values. The object out is a list having several elements which are described subsequently (and see Chapter 2, especially Section 2.3.2). In order to execute this function, the various objects referenced within the function must be loaded into the R workspace. These include M, elev, forest and the data vector z. These are available in the Web Supplement, and we provide an R script for carrying out these analyses.

A slightly more complicated version of the function (Panel 3.2) allows one to easily fit a suite of models by repeated function calls. Note that it constructs the log-likelihood for the maximal model, but defaults all coefficients to zero unless their names are passed to the function using the vars = option. We can execute this by issuing a command such as:

out<- nlm(lik,c(0,0,0,0), vars=c("const","elev1","elev2","forest"),hessian=TRUE)

This fits a model with an intercept (which is named const in the R log-likelihood definition), a quadratic response to elevation and a linear response to forest cover.

lik<-function(parms){ b0<-parms[1] b1<-parms[2] b2<-parms[3] b3<-parms[4] ones<-rep(1,M)

### Compute binomial success probabilities probs<-expit(b0*ones+b1*elev+b2*(elev"2)+b3*forest)

### evaluate log of binomial pmf tmp<-log(dbinom(z,1,probs))

### substitute 0 for missing values lik[!] <- tmp[!]


Panel 3.1. Construction of the likelihood for a logistic regression model in R for the willow tit data. Binomial models of this type can also be fitted using the R function glm.

The object out has a number of elements which can be manipulated using standard R conventions. In the present example, several of the elements are:

$minimum [1] 100.8872


[1] -0.5422133 1.8473435 -1.0608642 0.6469108 $gradient

[1] 2.349054e-05 3.146269e-05 6.520951e-05 4.888534e-05

Main interest is in the minimized negative log-likelihood (100.8872) and the 4 parameter estimates (in order of the vars specification). The numerical gradient should be close to 0 if the algorithm has converged (it has). Although some output is provided by default, the numerical hessian which is useful to obtain estimates of precision (see Section 2.3.2) is provided if we specify hessian=TRUE in the call to nlm.

The fitted response function (coefficient estimates rounded) is:

logit(nj) = -0.542 + 1.847elev, - 1.061 elev2 +0.647forest,.


probs<-expit(tmp[1]*ones+tmp[2]*elev+ tmp[3]*(elev~2)+tmp[4]*forest)



Panel 3.2. A slightly generalized R formulation of the logistic regression likelihood to facilitate fitting multiple models. The likelihood is specified in terms of the most complex model of interest, and coefficients are set to 0 by default. Covariates are included by specifying them with the vars = option.

Forest cover Elevation

Figure 3.3. Fitted response in occurrence probability for the willow tit to forest cover (percent of quadrat covered in forest) and elevation (meters above sea level).

Forest cover Elevation

Figure 3.3. Fitted response in occurrence probability for the willow tit to forest cover (percent of quadrat covered in forest) and elevation (meters above sea level).

One implementation of this in R is ests<-out$estimate logitp <- ests[1] + ests[2]*elev + ests[3]*elev~2 + ests[4]*forest where the new object logitp is a vector of length M.

There is a positive response in Pr(z = 1) to forest cover. i.e., a unit increase in forest cover (that would be a standard deviation - which is about 0.27) yields an increase of 0.647 in logit(n). Because of the scaling (logit scale) and standardization of covariates, it is most convenient to view the response to forest and elevation graphically, on the probability scale, and for unstandardized covariates. The response of occurrence probability to forest cover and elevation is depicted in Figure 3.3. The response function to elevation is concave, and the optimum probability of occurrence (taking the first derivative of the quadratic response function, setting that to 0, and solving) is calculated according to elevopt = —(1.847)/(2 * -1.061) = 0.870.

As before, this is in units of standard deviations from the mean. The mean elevation was 1182.57 m and the standard deviation was 646.33 m. Thus, the maximum occurrence probability occurs at an elevation of 1182.57 + 0.870 x 646.33 = 1744.87 m. The effect of elevation on n is shown in the right panel of Figure 3.3. We also used the estimated model parameters to generate estimates of n for all 41365 1 km quadrats in Switzerland. These estimates are mapped in Figure 3.4.

We might typically consider whether one, both, or either of elevation and forest cover are important or significant. If we made a distinction between biological and

Figure 3.4. Estimated probabilities of occurrence of the willow tit for every 1 km quadrat in Switzerland.

statistical significance, then we might formally evaluate the latter using hypothesis testing or model selection ideas, such as Akaike's Information Criterion (AIC) (Burnham and Anderson, 2002, also see Chapter 2). Results obtained by fitting a number of different models are summarized in Table 3.2. We see that the model previously described (the model containing all effects) is favored by AIC.

3.1.2 Observation Covariates

For the Swiss BBS data, we noted (Section 3.1.1) that the survey is based on 3 samples during the breeding season and that a number of covariates should affect the response. However, some of these covariates might not affect whether a species occurs on a particular quadrat but, rather, may influence variation in observed presence - our ability to detect the species given that it is there. These covariates include (at least) sample duration and also perhaps the date of the survey (which might influence bird behavior). In practice, we also might consider certain environmental covariates such as temperature, rainfall, and wind - factors that influence the behavior of many species and hence their detectability. Such covariates influence the efficacy of sampling to some extent, but not the underlying occurrence process. That is, the presence of willow tit at a site should not depend on such things as sample effort, duration, sampling intensity, or environmental conditions.

Table 3.2. Results of fitting several logistic regression models to the willow tit data from the Swiss Breeding Bird Survey. The estimated effects under each model are given in columns labeled by the effect name. 'np' is the number of parameters in the model.

model rank






Was this article helpful?

0 0

Post a comment