# Info

Only the left hand side - the interpretation - differs. For both historical (i.e. mathematical elegance) and computational (i.e. likelihoods often involve small numbers), it is common to work with the logarithm of the likelihood (called the log-likelihood, which we denote by L). In this case, of inference about p given k and N, the log-likelihood is

L(p| k,N) — log ^NQ + klog(p) + (N — k) log(1 — p) (3.31)

Now, if we think of this as a function of p, the first term on the right hand side is a constant - it depends upon the data but it does not depend upon p. We can use the log-likelihood in inference to find the most likely value of p, given the data. We call this the maximum likelihood estimate (MLE) of the parameter and usually denote it by p To find the MLE forp we take the derivative of L(p|k, N) with respect top set the derivative equal to 0 and solve the resulting equation for p

Show that the MLE for p is p = k/N. Does this accord with your intuition?

Since the likelihood is a function of p we ask about its shape. In Figure 3.4, I show L(p|k, N), without the constant term (the first term on the right hand side of Eq. (3.31) for k = 4 and N = 10 or k = 40 and N = 100. These curves are peaked atp = 0.4, as the MLE tells us they should be, and are symmetric around that value. Note that although the ordinates both have the same range (10 likelihood units), the magnitudes differ considerably. This makes sense: both p and 1 — p are less than 1, with logarithms less than 0, so for the case of 100 trials we are multiplying negative numbers by a factor of 10 more than for the case of 10 trials.

The most impressive thing about the two curves is the way that they move downward from the MLE. When N = 10, the curve around the MLE is very broad, while for N = 100 it is much sharper. Now, we could think of each value ofp as a hypothesis. The log-likelihood curve is then telling us something about the relative likelihood of a particular value of p Indeed, the mathematical geneticist A. W. F. Edwards (Edwards 1992) calls the log-likelihood function the "support for different values ofp given the data'' for this very reason (Bayesian methods show how to use the support to combine prior and observed information). 