The Hopfield network was conceived as a model of associative or content-addressable memory. To this end, its role was to learn certain specific binary data vectors from the set of all possible vectors of a given length, and to reproduce those given a sufficiently large partial vector (i.e., a partial memory). However, the Hopfield network model has a number of limitations which make it incapable of learning certain distributions of binary data. The Hopfield model has a limited capability to reproduce observed data. It can accurately store approximately 0.15 N A-bit data vectors (out of a space of 2 total data vectors). It will also easily be confounded by spurious unobserved data vectors which are introduced as an artifact by the training procedure, although Hopfield proposed a modified training algorithm which employed a form of'unlearning' to reduce the effects of such vectors. Furthermore, it is limited to only being able to learn pairwise correlations within the input data; it cannot learn patterns of higher order.
The Boltzmann machine, by contrast, is able to learn any distribution over all binary data vectors of a given length up to an arbitrary degree of accuracy. This includes data which contains fundamentally high-order relationships. The Boltzmann model also exhibits a much greater degree of generalization over unobserved data vectors. One can think of the Boltzmann machine as a system for modeling the underlying statistical patterns in a body of data. As it learns the underlying relationships in the data, observed data vectors become more and more probable under the model. Furthermore, other potential observed data drawn from the same source should also become more probable under the model, assuming that the training data set is representative of the underlying structures present in the distribution of the system as a whole. This means that the Boltzmann machine will be able to 'rank' new observations as being more or less probable given the model - that is how likely they are to have come from the same source.
For example, suppose a Boltzmann machine is trained to recognize handwritten digits. It will subsequently rate new, never-before-seen digits as having a high probability of being digits; whereas it will rate other data, such as photographs of faces, or random noise, as having a very low probability of being digits. The Boltzmann machine training algorithm also has an unlearning procedure similar to the one employed on Hopfield networks implicit in its operation, which minimizes the possibility that this 'spontaneous generalization' will incorrectly rank nondata vectors as being highly likely.
Was this article helpful?