## Definition of BNs

A BN is a directed acyclic graph (DAG) that leads to a compact representation, or factorization, of the joint probability distribution of a set of variables in a system of interest. Graphically, variables are represented by nodes and dependences between nodes are represented by directed edges. It is important to note that a BN illustrates patterns of probabilistic dependence (i.e., statistical correlation, causal influence, or diagnostic reasoning) and not the flow of mass or process control. Therefore, variables need not be in compatible units and states of variables need not be discrete.

The absence of a directly connecting edge between any two nodes in a BN implies that the two variables are independent given the values of any intermediate nodes. This means that the probability distribution of any variable Xj in any state of the network can be determined by knowing only the values of its immediate predecessors (called its parents, PA), without regard to the values of any other variables. This is referred to as the parental Markov property. In this way, the joint probability distribution for the entire network can be written as the product of a limited number of conditional distributions using the chain rule of probability calculus:

Nodes without any parents are called roots and are specified by marginal (i.e., unconditional) distributions. (Following the notation of Pearl 2000, lowercase symbols

Figure 1 A simple BN representing dependences among five variables characterizing eutrophication of a water body. Nutrient concentration (N) is assumed to influence algal density (A), which in turn influences the probability of algal toxins and water column hypoxia (H). These two variables then influence the probability of a fishkill (K).

Figure 1 A simple BN representing dependences among five variables characterizing eutrophication of a water body. Nutrient concentration (N) is assumed to influence algal density (A), which in turn influences the probability of algal toxins and water column hypoxia (H). These two variables then influence the probability of a fishkill (K).

are used to indicate particular realizations of the corresponding uppercase variables.)

Figure 1 illustrates a simple BN representing the relationships between nutrient concentration in a water body (N), algal density (A), the presence of algal toxins (T), water column hypoxia (H), and the occurrence of a fishkill (K). Such a graphical network can be drawn based on causal knowledge of the system. For example, the absence of a direct link between N and H captures our understanding that nutrient inputs to a water body only cause hypoxia via the stimulation of algae and not through any other direct or indirect means. In probabilistic terms, knowing A renders H and N independent. Together with the remainder of the relationships expressed in Figure 1, this implies that the joint distribution of all variables can be written in the mathematical form of eqn [1] as

P(k) = P(n) ■ P(a\n) ■ P(t\a) ■ P(h\a) ■ P(k\t, h) [2]

The recognition that causal assertions expressed in graphical form have practical implications for determining the probabilistic relationships among variables significantly facilitates the handling of uncertainty in complex systems. For example, generating a prediction for the probability of a fishkill given a particular nutrient concentration can proceed by decomposing the full causal chain connecting these two variables into the conditional relationships contained in eqn [2]. These local relationships can be quantified independently using the data, expert knowledge, or mechanistic models that are directly relevant. The parts can then be reassembled in a way that makes logical sense based on the causal assertions embedded in the graphical model.

The implications for forward prediction are perhaps the most obvious benefit of the BN approach. More subtle is the benefit to performance of probabilistic inference, or diagnosis. This will be discussed later in this article, but it is worth noting here that the use of Bayes's theorem for conducting such inference is one reason why BNs are called Bayesian. The other reason is that the subjective interpretation of probabilities as degrees of belief rather than as long run frequencies is consistent with the Bayesian philosophy. This will also be discussed in more detail below.