## Issues Variants and Limitations

In order to successfully store memories in the Hopfield network, an energy landscape is defined implicitly by the storage algorithm. In this landscape, the data vectors being stored form the bottoms of valleys or troughs. From some starting position, the network will descend this energy landscape into the nearest valley, eventually settling on a stored memory. Storing vectors which are very similar requires that a large amount of the weights' capacities be used just in order to keep the valleys of each vector distinct. Even if using vectors sampled at random from the space of all possibilities, there will be a limit to the number of vectors which can be stored before these valleys begin to intersect, and produce memory errors and fake or spurious memories as in the above examples. Experiments have shown that for networks of N units (corresponding to N-dimensional data vectors), only about 0.15N memories can be successfully stored before errors in performance become significant. So a network of 1000 units (i.e., a space of 1000-bit vectors), would only be able to efficiently retrieve 150 stored memories.

The energy landscape produced by the storage algorithm does not necessarily have basins of equal size for every vector. This means that some memories are 'more' content addressable than others, as they will be the attrac-tor for a larger segment of the state space. That is, a higher proportion of vectors, drawn at random, will tend toward certain minima rather than others. Also, when using symmetric weights, spurious minima (false memories) are created which are the inversed (bit-flipped) versions of each data vector. That is, if (01101011) is a stable state, then so is (10010100). In order to deal with issues such as these, Hopfield in 1983 proposed a modified algorithm, which employed an iterative 'unlearning' procedure to reduce the effect of spurious minima and balance the attraction of each stored vector.

The storage algorithm can be recast as an iterative one, where the weight is incremented by the product of the corresponding coefficients of data vector k

This is assuming units take on values of + 1 and — 1. From this, an 'unlearning' or negative learning can be defined as

where sif is the final state of unit i, after allowing the network to settle to an equilibrium from a random starting state. This has the effect of raising the energy slightly for the equilibrium state, and this effect is increased the more times the same state is sampled, resulting in a cumulative effect proportional to the size of that state's basin of attraction. Experiments have shown that unlearning of this kind can reduce the overall accessibility of spurious memories, and can increase the total memory capacity of the network to as high as 0.25N. Hopfield proposed that this kind of unlearning could be a computational equivalent to the hypothesis put forward by Crick and Mitchison that unlearning (forgetting) during rapid eye movement (REM) sleep plays a critical role in the consolidation of long-term memories.

The role of weights and biases in the network can be thought of as a way of representing the relative frequencies of statistical properties of the input data. The biases of individual units can represent how likely a particular unit is to be on in the input data, and the weights convey how likely two units are to take on the same values, thus representing pairwise relationships in the data. Since there is always exactly one unit per coefficient in the data vectors, Hopfield networks can only effectively represent first- and second-order relationships. There are no means for these networks to represent other, higher-order patterns. For example, the

 x y xXORy 0 0 0 0 1 1 1 0 1 1 1 0

exclusive-or (XOR) function is notoriously difficult for many primitive neural network models to learn, because the relevant relations are entirely third-order in nature (Table 2).

Given only the frequencies of the individual units being on or off and the frequencies of the pairwise relations in XOR, there is no information available which would allow a network to favor the data vectors (xy,xXORy) over the other four possible 3-bit vectors.

Other ANN models are able to capture such higherorder information through the use of 'hidden units'. These are units in the network which do not correspond to any part of the input or output of the system. Rather, they are available to use their weights to encode relevant higher-order relations in the data. When such a unit is on, it could indicate the presence of such a higher-order data feature. As such they are often referred to as 'feature detectors', though the specific features they are sensitive to are rarely known before the network is trained, but are learned automatically by the network over the course of training. They can effectively represent a high-order relationship between the patterns of activation of a group of many units into a single activation, which can in turn be used by other units as an indicator of the 'feature'. Hidden units constitute one of the improvements that the Boltzmann machine model has over the Hopfield networks. 