a burn-in period of 150 samples (that is, 150 x 50 = 7,500 iterations) is enough, but we ran it for 10,000 samples to be sure.

More formal tests to assess the chain convergence are also available and are provided in the R package CODA (Plummer et al., 2008). One such test is the Gelman-Rubin statistic, which compares the variation between chains to the variation within a chain. Initially, the value of the Gelman-Rubin statistic will be large, but when convergence has been reached, it will have decreased to a value close to 1. Gelman (1996) suggests a value less than 1.1 or 1.2 is acceptable. This test should be applied to each of the model parameters. For our data, the Gelman-Rubin statistic was less than 1.2 for all parameters after 10,000 iterations, implying that from iteration 10,000 onwards, the draws come from the posterior distribution. Other tests (not shown here; full codes can be found at the book web site) reached similar conclusions.

Having decided on a burn-in of 10,000 iterations, we now want to formally keep the realised samples from iteration 10,000 onwards. This is achieved with the command samplesSet. The command dicSet allows for monitoring of the so-called Deviance Information Criterion, which can be used for model selection. Using the modelUpdate command, the chains are run for another 100,000 iterations. The R command samplesStats provides the following output.

> samplesStats("alpha")

mean sd MCerror val2.5pc median val97.5pc start sample alpha 5.568 0.02176 0.0001381 5.526 5.568 5.61 201 30000

The last two columns indicate when monitoring started (from the 201st sample onwards - recall that the first 200 stored samples were discarded as burn-in) and how many samples have been stored since monitoring started (10,000 samples for each of three chains). As a consequence, the summary statistics are based on 30,000 samples. To save the space, the last two columns have not been printed (start and number of samples) in the following text as they are the same as above

> samplesStats("b")

mean sd MC_error val2.5pc median val97.5pc b[1] -0.01670 0.009474 5.184e-05 -0.035220 -0.01668 0.001873

b[2] -0.20830 0.007988 4.446e-05 -0.224100 -0.20830 -0.192700

b[3] 0.01235 0.007782 4.579e-05 -0.002963 0.01238 0.027670

b[4] -0.05423 0.006975 3.980e-05 -0.067870 -0.05421 -0.040630

b[5] 0.10880 0.011200 6.361e-05 0.086900 0.10880 0.130800

> samplesStats("W")

mean sd MC.error val2.5pc median val97.5pc

W[2] -0.22850 0.02725 0.0001631 -0.28210 -0.22860 -0.175000

W[3] -0.05374 0.02324 0.0001417 -0.09905 -0.05379 -0.008206

W[4] -0.05954 0.02170 0.0001434 -0.10190 -0.05969 -0.016920

> samplesStats("S")

mean sd MC_error val2.5pc median val97.5pc

S[2] -0.1339 0.01535 9.53e-05 -0.164 -0.1339 -0.1036

We also list the DIC statistics as an overall model fit indicator (this is similar to AIC and discussed in details at the end of chapter):

Dbar Dhat DIC pD Abun 1890 1880 1900 9.948 total 1890 1880 1900 9.948

The samples from the posterior distribution are summarised by the mean, median and 2.5 and 97.5 percentiles. Note that the mean values are similar to those obtained by the glm command. The standard deviation of the posterior distribution, given under the header sd, is the Bayesian equivalent of the standard error of the mean (recall that the standard error of the mean is defined as the standard deviation of the mean values if the study were to repeated many times). Again, in this case, the values are similar to those obtained from glm.

The MCMC output contains thousands of realisations of all the model parameters, and these can be used to calculate various quantities of interest. For example, the correlation between the parameters can be obtained, and is shown in Table 23.1. High correlation is commonly expected for the constant and factor effects only. If

alpha b1 b2 |
b3 |
b4 |
b5 |

Was this article helpful?

## Post a comment