In probability theory, we are concerned with outcomes of "experiments," broadly defined. We let S be all the possible outcomes (often called the sample space) and A, B, etc., particular outcomes that might interest us (Figure 3.1a). We then define the probability that A occurs, denoted by Pr{A}, by

Figuring out how to measure the Area of A or the Area of S is where the hard work of probability theory occurs, and we will delay that hard work until the next sections. (Actually, in more advanced treatments, we replace the word "Area" with the word "Measure" but the fundamental

Figure 3.1. (a) The general set up of theoretical probability consists of a set of all possible outcomes S, and the events A, B, etc., within it. (b) Two helpful metaphors for discrete and continuous random variables: the fair die and a ruler on which a needle is dropped, constrained to fall between 1 cm and 6 cm. (c) The set up for understanding Bayes's theorem.

Figure 3.1. (a) The general set up of theoretical probability consists of a set of all possible outcomes S, and the events A, B, etc., within it. (b) Two helpful metaphors for discrete and continuous random variables: the fair die and a ruler on which a needle is dropped, constrained to fall between 1 cm and 6 cm. (c) The set up for understanding Bayes's theorem.

notion remains the same). Let us now explore the implications of this definition.

In Figure 3.1a, I show a schematic of S and two events in it, A and B. To help make the discussion in this chapter a bit more concrete, in Figure 3.1b, I show a die and a ruler. With a standard and fair die, the set of outcomes is 1, 2, 3, 4, 5, or 6, each with equal proportion. If we attribute an ''area'' of 1 unit to each, then the ''area'' of S is 6 and the probability of a 3, for example, then becomes 1/6. With the ruler, if we ''randomly'' drop a needle, constraining it to fall between 1 cm and 6 cm, the set of outcomes is any number between 1 and 6. In this case, the ''area'' of S might be 6 cm, and an event might be something like the needle falls between 1.5 cm and 2.5 cm, with an ''area'' of 1 cm, so that the probability that the needle falls in the range 1.5-2.5 cm is 1 cm/ 6 cm = 1/6.

Suppose we now ask the question: what is the probability that either A or B occurs. To apply the definition in Eq. (3.1), we need the total area of the events A and B (see Figure 3.1a). This is Area of A + Area of B -overlap area (because otherwise we count that area twice). The overlap area represents the event that both A and B occur, we denote this probability by so that if we want the probability of A or B occurring we have

Area common to A and B

Area of S

and we note that if A and B share no common area (we say that they are mutually exclusive events) then the probability of either A or B is the sum of the probabilities of each (as in the case of the die).

Now suppose we are told that B has occurred. We may then ask, what is the probability that A has also occurred? The answer to this question is called the conditional probability of A given B and is denoted by Pr{AjB}. If we know that B has occurred, the collection of all possible outcomes is no longer S, but is B. Applying the definition in Eq. (3.1) to this situation (Figure 3.1a) we must have

and if we divide numerator and denominator by the area of S, the right hand side of Eq. (3.4) involves Pr{A, B} in the numerator and Pr{B} in the denominator. We thus have shown that

This definition turns out to be extremely important, for a number of reasons. First, suppose we know that whether A occurs or not does not depend upon B occurring. In that case, we say that A is independent of B and write that Pr{A|B} = Pr{A} because knowing that B has occurred does not affect the probability of A occurring. Thus, if A is independent of B, we conclude that Pr{A, B} = Pr{A}Pr{B} (by multiplying both sides of Eq. (3.5) by Pr{B}). Second, note that A and B are fully interchangeable in the argument that I have just made, so that if B is independent of A, Pr{B |A} = Pr{B} and following the same line of reasoning we determine that Pr{B, A} = Pr{B}Pr{A}. Since the order in which we write A and B does not matter when they both occur, we conclude then that if A and B are independent events

Let us now rewrite Eq. (3.5) in its most general form as

and manipulate the middle and right hand expression to conclude that

Equation 3.8 is called Bayes's Theorem, after the Reverend Thomas Bayes (see Connections). Bayes's Theorem becomes especially useful when there are multiple possible events B1, B2,... Bn which themselves are mutually exclusive. Now, Pr{A} = ^n=i Pr{A, B,-} because the B,-are mutually exclusive (this is called the law of total probability). Suppose now that the B- may depend upon the event A (as in

Figure 3.1c; it always helps to draw pictures when thinking about this material). We then are interested in the conditional probability Pr{Bi|A}. The generalization of Eq. (3.8) is

Note that when writing Eq. (3.9), I used a different index (j) for the summation in the denominator. This is helpful to do, because it reminds us that the denominator is independent of the numerator and the left hand side of the equation.

Conditional probability is a tricky subject. In The Ecological Detective (Hilborn and Mangel 1997), we discuss two examples that are somewhat counterintuitive and I encourage you to look at them (pp. 43-47).

Was this article helpful?

## Post a comment