where each observation has been generated from an underlying, but unknown probability distribution P(x, y).

Let A be the set of all possible parameter vectors for the function fa. Our goal is to find the function fa*, called the decision rule, that minimizes the risk function

where a* is the parameter vector that yields the minimizing function out of all possible functions fa for all a p A. However, since P(x, y) is unknown, we cannot calculate R(a). An estimate of the risk function is the empirical risk function

