The beta density and patch leaving

Recall the end of our discussion about the binomial distribution: that if we had data of only k successes, then the maximum likelihood estimates for N and p are N = k and p = 1, but these are just plain silly. We will now show how to get around this problem by using the beta probability density. This probability density will be defined below, but first let's examine a slightly different motivation for the beta density, based on a question in foraging theory.

Imagine a forager moving in a patchy environment, seeking food that comes in discrete units, such as seeds. This same example also applies to a foraging parasitoid, seeking hosts in which to place its eggs; we will discuss parasitoids in great detail in the next chapter. When the forager enters a new patch, the probability of finding food, P, will be unknown, although there might be a prior distribution for it, perhaps described by the environmental average or the forager's history until now.

In the current patch, the forager collects data that consist of having found S = s items in A = a attempts at finding food. Clearly, the natural estimate of P is s/a. If P = p were known, the probability of the data (s, a) is given by the binomial distribution

We now ask: ''given the data (s, a), what can be said about P?''. That is, we want to know the probability that P falls in the interval [p, p + dp] given the data (s, a) of s successes in a attempts at finding food. We followed a similar line of reasoning when discussing the Poisson process in which the rate parameter was unknown. Recall that the idea is we begin with a prior probability density for the unknown parameter and use the data and Bayes's Theorem to update the prior and construct the posterior probability density for the parameter, given the data.

If we denote the prior by fo(p) with the interpretation that fo(p)dp = Pr{p < P <p + dp}, the posterior distribution is computed from Bayes's Theorem f (p| data) dp = Pr{p < P <p + dp. data} = P^dj „(fldp (3 94) J ' F Prfdata} Prfdata} v y

Using Eq. (3.93) and dividing by dp gives

As before, we need to pick a choice for the prior before we can go any further. The classical answer to this choice is to use the beta density with parameters a and P, defined according to fo(P)=FTTF^—1'(1 — 1 (3.96)

The reason for this choice will become clear momentarily. Just a few details about it. First, the beta function B(a, P) is defined by B(a,P) = jJ,pa—1(1 — p)P—11dp, so that f o(P) = [1/B(a,P)]pa—1(1 — p)P—1

(which also tells us how to relate the beta and gamma functions). Second, the mean and variance of P are given by

Third, the shape of the beta density varies according to the choices of the parameters (Figure 3.13). Fourth (Abramowitz and Stegun 1974, p. 944), if %2 and %2 are chi-squared random variables with and degrees of freedom respectively, then the ratio %2 / (x2 + x2) has a beta probability density with parameters a = v1/2 and P = v1/2.

Why pick the beta probability density? The following exercise should answer the question.

0 0