As described above, the conditional probabilities of a BN can be derived from data using statistical methods, and almost any approach is appropriate as long as results can be represented probabilistically. Linear and nonlinear regression, quantile regression, logistic regression, generalized additive models, and classification and regression trees are all suitable tools for conditional probability determination. The fact that a BN can be easily decomposed into independent substructures means that statistical methods can be chosen that are optimal for the nature of the variables in those subnetworks, without regard to the larger set of variables comprising the full network.
There may, however, be situations in which it is desirable to learn the conditional probability distributions of many nodes in a network simultaneously from a set of case data. Cases are examples, events, or situations for which the values or discrete states of some or all of the variables in a network are known. Learning can occur either by starting in a state of ignorance for all nodes or by starting with Bayesian prior distributions that are based on preexisting knowledge.
If every case provides a value or discrete state for every variable, then learning the conditional probabilities of the network occurs through straightforward algorithms for Bayesian updating. If there are variables for which none of the cases have any data (latent variables), for which data are available for some cases but not for others (missing data), or for which data are expressed as likelihoods rather than certain values, then other, more complex, learning algorithms are required. These are usually based on optimization methods that attempt to find the set of network probabilities with maximum likelihood given the observed data. Expectation-maximization (EM) and gradient descent are the two most common such algorithms employed in BN learning.
Artificial neural networks (ANNs) employ many ofthe same learning algorithms as BNs, with latent variables in a BN corresponding to hidden neurons. This invites a comparison between the two: in general, BNs employ fewer hidden nodes and the learned relationships between the nodes are more complex. The result of BN learning usually has a direct physical interpretation (as a causal process), rather than simply leading to a set of empirical weights. This need for a causal interpretation may help avoid the problem of overfitting. Finally, as mentioned above, BNs can be treated as modular, so that parts of one network can be extracted and connected to other structures. This is not usually the case for ANNs.
Was this article helpful?