The Hippocampus

Memory Professor System

10x Your Memory Power

Get Instant Access

Many ofthe cognitive processes involved in foraging, including spatial memory, working memory, episodic and declarative memory, the formation of complex associations, and the integration of experience over time, to name

BOX 3.3 Neural Mechanisms of Reward

Peter Shizgal

Neuroscientists are striving to identify the neural circuitry that processes rewards and to determine its role in learning, prediction of future consequences, choice between competing options, and control of ongoing actions. The following examples illustrate neuroscientific research on reward mechanisms and its relation to foraging.

Reward Prediction in Monkeys

Wolfram Schultz and his co-workers carried out an influential set of studies on the activity of single dopamine-containing neurons during conditioning experiments in macaque monkeys (Schultz 1998, 2000). Midbrain dopamine neurons in monkeys and other mammals make highly divergent connections with widely distributed targets in the brain. These neurons have been linked to many processes important to foraging behavior, including learning about rewards and the control of goal-directed actions.

One of the experimental tasks often employed by Schultz's group is delay conditioning. A typical conditioned stimulus (CS) is a distinctive visual pattern displayed on a computer monitor. After a fixed delay, the CS is turned off, and an unconditioned stimulus (US), such as a drop of flavored syrup, is presented (fig. 3.3.1). An intertrial interval of unpredictable duration (dashed line) then ensues before the CS is presented again.

As shown in figure 3.3.1, dopamine neurons typically respond with a brief increase in their firing rate when the US is first presented (left column, bottom trace). However, after the monkey has learned that the CS predicts the occurrence of the US, the dopamine neurons no longer respond to delivery of the reward (the US). Instead, they produce a burst of firing at the onset of the CS (central column). If a second CS is presented prior to the original one (not shown), the burst of firing transfers to the new CS, which has become the earliest reliable predictor of reward. Omission of the US, after the CS-US relationship has been learned, leads to a brief decrease in the firing rate of the dopamine neurons (right column).

The activity ofthe dopamine neurons at the time ofreward delivery appears to reflect some sort of comparison between the reward that the monkey receives and the reward it had expected. When the monkey encounters the US for the first time, it is not yet expecting a reward; the outcome is thus better than anticipated, and the dopamine neurons increase their firing rate. After training, delivery ofthe reward merely confirms the monkey's expectation, and thus the dopamine neurons are quiescent when the anticipated reward is delivered. Omission of the reward constitutes a worse-than-expected outcome, and the firing of the dopamine neurons slows.

Figure 3.3.2 provides a simplified depiction of a model that compares expectations to experience (Montague et al. 1996; Schultz et al. 1997). The moment-to-moment change in the reward prediction is computed by taking the difference between the reward predicted at a given instant

Food Evaluation Forms Samples

Figure 3.3.1. Responses of midbrain dopamine neurons in monkeys during delay conditioning. Presentations of the conditioned stimulus (CS) are separated by intervals of unpredictable duration (dashed lines). The unconditioned stimulus (US), a drop of juice, is delivered immediately following the offset of the CS. The gray traces represent elements of a model (see Figure 3.3.2) that attributes the changes in dopamine firing to temporal difference (TD) errors. The computation of the temporal difference and the temporal difference error is depicted in Figure 3.3.2. The internal signal that tracks the value of an ongoing reward (the US) is labeled "r."

Figure 3.3.1. Responses of midbrain dopamine neurons in monkeys during delay conditioning. Presentations of the conditioned stimulus (CS) are separated by intervals of unpredictable duration (dashed lines). The unconditioned stimulus (US), a drop of juice, is delivered immediately following the offset of the CS. The gray traces represent elements of a model (see Figure 3.3.2) that attributes the changes in dopamine firing to temporal difference (TD) errors. The computation of the temporal difference and the temporal difference error is depicted in Figure 3.3.2. The internal signal that tracks the value of an ongoing reward (the US) is labeled "r."

in time and the reward predicted during the previous instant. Recall that the duration of the intertrial interval is unpredictable. Thus, during the instant prior to the onset of the CS, the monkey does not know exactly when it will receive the next reward. This lack of predictability is resolved in the next instant by the appearance of the CS. The positive "temporal difference" in the reward prediction indicates that the monkey's prospects have just improved.

It has been proposed (Montague etal. 1996; Schultzetal. 1997) that the dopamine neurons encode a "temporal difference error." As shown in figure 3.3.2, this error signal is produced when the temporal difference in reward prediction is combined with a signal indicating the value of the delivered reward. Consider the situation of a well-trained subject at CS offset (see fig. 3.3.1, central column). The instant before the CS is turned off, the reward prediction is strong. However, as soon as the CS disappears from the screen, an intertrial interval ofunpredictable duration begins. Thus, the occurrence of the next reward has become less predictable, and the sign of the temporal difference is negative (trace labeled "TD"). However, this

Kernkwaliteiten Kernkwadranten
Figure 3.3.2 A simplified depiction of a model that uses temporal difference errors to shape predictions about reward and to control reward-seeking actions.

negative temporal difference coincides with the delivery of the reward. The positive value of the reward ("r") cancels the negative temporal difference. Thus, there is no error signal at the time of reward delivery, and no change in dopamine firing. Omission of the reward (right column) yields a negative temporal difference error and a decrease in dopamine firing. At CS onset in a well-trained subject (central and right columns), the reward prediction has improved. This yields a positive temporal difference error, which is reflected in increased dopamine firing.

In a class of models developed by computer scientists (Sutton and Barto 1998), temporal difference errors are used to form and modify predictions about future rewards by altering the weights of connections in a neural network. A positive error increases (and a negative error decreases) the influence on reward prediction exerted by stimuli that were present during the previous instant. Thus, the temporal difference error produced in the initial conditioning trial (see fig. 3.3.1, left column) boosts the influence of the final instant of the CS on reward prediction. Over the course of repeated conditioning trials, these weight changes propagate backward through the CS-US interval to the earliest reliable predictor of reward, the onset of the CS.

Independent experiments have demonstrated that briefincreases in the release of dopamine can change the sizes of cortical regions that respond to specific sensory inputs (Bao et al. 2001). This finding provides indirect support for the hypothesis that the briefchanges in dopamine firing observed by Schultz's group are sufficient to change the strength of connections between neurons that form predictions of future rewards.

The activity of dopamine neurons can be described over multiple time scales (Schultz 2000). Prolonged, slow changes in the average extracellular concentration of dopamine have been observed during events such as the consumption of a tasty meal (Richardson and Gratton 1996). Thus, brief fluctuations in firing rate, such as those observed during conditioning experiments, may be superimposed on a background of slow changes in neu-rotransmitter release. Given these multiple time scales and the very widespread connections of the midbrain dopamine neurons, it is perhaps not surprising that these neurons have been implicated in many functions in addition to reward prediction, including the exertion of effort and the switching of attention and motor output. Thus, dopamine neurons may make multiple contributions to foraging behavior through several different psychological processes.

Foraging by Model Bees

Forming accurate predictions about future rewards is clearly advantageous to a forager. To reap the benefits of such predictions, the forager must use them to guide its actions. Note that in figure 3.3.2, the temporal difference error not only shapes reward predictions, but also influences reward-seeking actions. A simulation study (Montague et al. 1995) illustrates how temporal difference errors can guide a forager to promising patches.

The core element of the simulation is modeled on the properties of the VUMmx1 neuron of the honeybee, which is described in section 3.4.

This neuron shows some interesting homologies to the midbrain dopamine neurons of mammals. Like the projections ofthe midbrain dopamine neurons, the projections of the VUMmxl neuron are highly divergent (see fig. 3.1). The VUMmxl neuron releases octopamine, a neurotransmitter closely related to dopamine. The VUMmxl neuron fires in response to certain rewards, and does so more vigorously when the rewarding stimulus is unexpected.

Real VUMmxl neurons respond to chemosensory inputs (e.g., nectar). The model neuron, which we will call "VUMmxx," responds to visual cues as well and computes a temporal difference error. During encounters with flowers, the model VUMmxx neuron alters weights in a neural network that generates reward predictions. As a result, the model can learn which of several differently colored flower types contains nectar.

The output ofthe VUMmxx neuron steers the flight ofthe model bee; weight changes in the model are dependent on contact with flowers, so reward predictions do not change while the bee is flying. The decision rule governing flight is very simple. The stronger the output ofthe simulated neuron, the larger the likelihood that the bee will continue on its present heading; the weaker the output of the simulated neuron, the larger the likelihood that the bee will reorient randomly.

The distribution offlowers in the artificial field is nonuniform; although the field includes equal numbers of blue and neutral-colored flowers, the random scattering of flower types generates small "clumps" in which one of the colors predominates. Due to the learning that occurred during the model bee's prior contacts with the flowers, the strength ofthe influence exerted by each flower color on the firing of the simulated VUMmxx neuron varies according to the weights in the network. Let's assume that blue flowers recently yielded nectar and neutral-colored flowers did not.

When the model bee is flying at low altitudes, only a small number of flowers fall within its field of view, and a clump of one color is likely to predominate. If that color is neutral, and the predominance of neutral-colored flowers extends to the center ofthe field of view, then the firing of the simulated VUMmxx neuron will decrease as the bee descends. The action rule will then cause the bee to reorient, breaking off its approach to the unpromising patch. However, if blue flowers predominate, their prevalence will increase as the bee descends, and the rate of firing of the simulated neuron will tend to increase. This generates a positive temporal difference error, which strengthens the bee's tendency to approach the blue flowers. Thus, temporal difference errors can guide a forager toward promising patches.

Foraging for Brain Stimulation

Electrical stimulation of the VUMmxl neuron in the honeybee can serve as the US in a classical conditioning experiment. In the vertebrate brain, there are widely distributed sites where electrical stimulation serves as a most effective reward. Rats will work vigorously to obtain such stimulation by pressing a lever or even leaping over hurdles as they run up a steep incline.

Dopamine neurons play an important role in the rewarding effect of electrical stimulation, but the exact nature of that role has yet to be determined. Altering the synaptic availability of dopamine or blocking the receptors at which it acts changes the strength of the rewarding effect (Wise 1996). What is not yet clear is whether the reward signal is encoded directly by brief pulses of dopamine release or whether the dopamine neurons play a less direct role, for example, by amplifying or suppressing reward signals carried by other neurons.

Under the usual experimental conditions, the activation of dopamine neurons by the rewarding stimulation is mostly indirect, through synaptic input from the neurons that are fired directly by the electrode (Shizgal and Murray 1989). In principle, such an arrangement makes it possible for other inputs (e.g., signals representing reward predictions) to oppose the excitatory drive from the directly activated cells, which could explain why the briefstimulation-induced pulses ofdopamine release decline over time (Garris et al. 1999). The input from the directly activated neurons may play the role of a "primary reward signal" ("r" in figures 3.3.1 and 3.3.2), which normally reflects the current value ofa goal object, such as a piece of food. Indeed, the rewarding effect ofelectrical stimulation has been shown to compete with, sum with, and substitute for the rewarding effects of gustatory stimuli (Conover and Shizgal 1994; Green and Rachlin 1991).

It is very difficult to hold the value of a natural reward constant over time because of sensory adaptation and satiety. In contrast, rats and other animals will work for hours on end to obtain rewarding brain stimulation. This property makes brain stimulation a handy tool for studying neural and psychological processes involved in foraging. The strength, duration, and rate of availability of the stimulation are easily controlled, and the experimenter can set up multiple "patches" with different payoffs by offering the subject multiple levers or a maze with multiple goal boxes.

In research modeled on foraging, C. R. Gallistel and his co-workers have studied how the magnitude and rate of reward are combined by self-stimulating rats (Gallistel and Leon 1991; Leon and Gallistel 1998). Two levers are provided, and the rat cannot predict exactly when the stimulation will become available. However, the rat is able, over multiple encounters, to estimate the mean rate of reward at each lever. Faced with two levers that are armed at different rates and that deliver rewarding stimulation of different strengths, the rat tends to shuttle between them. Its allocation of time between these two "patches" matches a simple ratio of the respective "incomes," the products of the perceived rates of reward delivery and the subjective magnitudes of the rewarding effects (Gallistel 1994; Gallistel et al. 2001).

The rats in Gallistel's experiments not only learn about the rates and magnitudes of rewards, but also learn about the stability of the payoffs over time (Gallistel et al. 2001). When the experimenter makes frequent, unsignaled changes in the relative rates of reward, the rats adjust their behavior very quickly so as to invest more heavily in the option that has started to yield the higher payoffs. However, when the experimental conditions have long been constant, the rats' behavior shows much more inertia following a sudden change in the relative rates of reward. Such tendencies would help a forager make use ofits past experience in deciding whether a recent decline in returns reflects a bona fide trend toward patch depletion or merely a noisy, but stable, distribution of prey.

Gallistel has interpreted these results within a theoretical framework (Gallistel 1990; Gallistel and Gibbon 2000) very different from the associ-ationist view that changes in connection weights are the basis of learning. In the rate estimation theory proposed by Gallistel and Gibbon (2000), the animal acts like a statistician making decisions on the basis of data on reward rates, time intervals, and reward magnitudes. They argue that these data are stored in representations that cannot be constructed solely from the building blocks posited by associationist theories. In contrast to the division of time into discrete steps in the models in figures 3.3.1 and 3.3.2, time is treated as a continuous variable in rate estimation theory. Decisions such as patch leaving are under the control of internal stochastic processes and need not be driven by transitions in external sensory input.

The debate between proponents of associationist and rate estimation theories concerns the neural and psychological bases ofevaluation, decision making, and learning. These processes are fundamental to the ability of

(Box 3.3 continued)

foragers to allocate their behavior profitably. It will be interesting indeed for students of foraging to see how this debate plays out.

Suggested Readings

Dyan and Abbott's (2001) textbook presents temporal difference learning within an overview of computational approaches to many different topics in neuroscience and psychology. Gallistel and Gibbon (2000) challenge associationist accounts of learning and recast both classical and operant conditioning phenomena in terms of rate estimation theory, a decision-theoretical viewpoint based on the learning of time intervals, rates, and reward magnitudes. Schultz and Dickinson (2000) review the role ofpre-diction errors in behavioral selection and learning.

just a few, have at one time or another been attributed to the vertebrate hippocampus. In fact, a number of authors have pointed out the functional similarities between the vertebrate hippocampus and the insect mushroom bodies (Capaldi et al. 1999; Waddell and Quinn 2001). As Waddell and Quinn put it, "Both systems show elegantly regular, only slightly scrutable anatomical organization and appear suited to deal with complex, multimodal assemblies of information" (2001, 1298). In most mammals, the hippocampus, which is part of the limbic system, is an arch-shaped structure deep within the brain. In humans and primates, however, the arch is straightened into an elongated structure that lies entirely within the temporal lobe. In birds, the hippocampus lies at the dorsal surface of the brain along the midline between the hemispheres, the position also occupied by the evolutionarily homologous structure in reptiles, the dorsomedial forebrain. The hippocampus receives input from most sensory modalities via the entorhinal cortex after this sensory information has been processed in other brain areas. The hippocampus sends efferent output to many areas, both within the limbic system and elsewhere in the brain.

Discovering exactly what takes place in the hippocampus, in cognitive terms, has proved elusive. As in the fable of the blind men and the elephant, different research groups concerned with different aspects of behavior have come to very different conclusions about what the hippocampus does. There is good evidence that the hippocampus plays an important role in spatial orientation in birds, mammals, and reptiles. There is also evidence that learning of complex relations among stimuli—spatial or nonspatial—depends on the hippocampus. In humans, damage to the hippocampus disrupts the ability to form new episodic memories—memories of everyday events or episodes— but does not seem to impair procedural memory—the ability to learn new skills and procedures. People with damage to the hippocampus can, for example, learn a new computer skill without any awareness ofwhere or when they learned it or even any recollection that they now possess this skill. Conflicting conclusions about the function of the hippocampus are probably the result of various researchers grasping different parts of what is clearly a complex beast. In this section, we will discuss two proposed cognitive functions ofthe hippocampus, spatial orientation and declarative memory, and describe the evidence that supports each of these ideas.

Was this article helpful?

0 0

Post a comment