Differential contribution of dopaminergic transmission at D1- and D2-like receptors to cost/benefit evaluation for motivation in monkeys

It has been widely accepted that dopamine (DA) plays a major role in motivation, yet the specific contribution of DA signaling at D1-like receptor (D1R) and D2-like receptor (D2R) to cost-benefit trade-off remains unclear. Here, by combining pharmacological manipulation of DA receptors (DARs) and positron emission tomography imaging, we assessed the relationship between the degree of D1R/D2R blockade and changes in benefitand cost-based motivation for goal-directed behavior of macaque monkeys. We found that the degree of blockade of either D1R or D2R was associated with a reduction of relative incentive effect of reward amount, where D2R blockade had a stronger effect. Workload-discounting was selectively increased by D2R antagonism, whereas delay-discounting was similarly increased after D1R and D2R blockades. These results provide fundamental insight into the specific actions of DARs in the regulation of the cost/benefit trade-off and important implications for motivational alterations in both neurological and psychiatric disorders.


Introduction
In our daily lives, we routinely determine whether to engage or disengage in an action according to its benefits and costs i.e., motivational value. For motivational value computation, the expected value of benefits (i.e., rewards) has a positive influence, while the cost necessary to earn the expected reward has a negative impact and discounts the net value of reward [1][2][3]. Arguably, the dopamine (DA) system plays a central role in the benefit-and cost-based computation of motivational value. Phasic firing of midbrain DA neurons correlates with the magnitude of future rewards, while it decreases according to the expected cost to be expended for the rewards, such as physical effort, time delay to reward, and reward probability [4][5][6][7][8][9]. DA neurons are also implicated in conveying information about vigor in a sustained manner ("tonic firing"; [10,11]). Several studies demonstrated that DA neurotransmission was causally involved in incentive motivation, i.e., in the enhancement of actions by the amount of expected reward [12][13][14][15][16]. In humans, the alteration of DA transmission is frequently associated with various pathological impairments of motivation such as anergia, fatigue, psychomotor retardation, and apathy, which are fre-quently observed in people with depression, schizophrenia or Parkinson's disease [14,[17][18][19]. But even if DA signaling is clearly involved in the regulation of behavior based on the cost/benefits trade-off, the underlying mechanisms remain debated.
DA signaling is mediated at post-synaptic sites by two classes of DA receptors (DARs), the D1-like receptor (D1R) and the D2-like receptor (D2R), and both classes are thought to be involved in motivation and decision-making based on the cost/benefits trade-off. For instance, blocking either D1R or D2R reduced the likelihood and speed of engagement of cued action to obtain a future reward [20,21]. Blockade of either D1R or D2R biases animals' choices in tasks manipulating the cost/benefits trade-off, where the cost involved physical effort (effort-discounting, [22][23][24]) or time (delay-discounting, [23,25]). But since in most of these studies cost and reward were not manipulated independently, the relative impact of DA treatment on the cost vs. benefit components of evaluation remains hard to identify (see [9], for further discussion). To clarify the relation between DA, reward, and cost, it is thus critical to use behavioral task where reward and cost are manipulated independently.
Another challenge of pharmacological manipulations is how to compare the role of two receptor subtypes quantitatively. Previous studies described the effect of DAR blockade according to the antagonist dose-response relationship for each DAR subtype. However, because each antagonist has different characteristics (e.g., target affinity, brain permeability, biostability), the relationships cannot be directly compared together with the doses. On the other hand, receptor occupancy appears to provide an objective reference for receptor blockade. For example, positron emission tomography (PET) studies of patients have shown that in vivo D2R occupancy is a reliable predictor of clinical and side effects of antipsychotic drugs [26,27]. Similarly, receptor occupancy has been measured in rats and monkeys, and the relationship with the behavioral effects following D2R antagonists [28][29][30]. Thus, to better understand the role of DARs in motivation, it would be critical to monitor occupancy following antagonists and compare the effects on distinct components of decision-making along with D R A F T occupancy.
In the present study, we aimed to quantify and directly compare the roles of DA signaling via D1R and D2R in decisionmaking based on the trade-off between reward and two types of costs (time vs workload) in macaque monkeys. For this purpose, we manipulated DA transmission by systemic application of DAR-specific antagonists and examined the relationship between the occupancy of D1R vs D2R and the changes in sensitivity to reward magnitude, workload and delay. We established the dynamic action of DAR antagonists by measuring the degree of DA receptor occupancy using in vivo PET imaging with selective radioligands. To quantify the effects of DAR blockades on incentive motivation, we used a behavioral task in which we manipulated the predicted reward size. To quantify the effects of DA manipulation on cost-based decision-making (i.e., effort or delay discounting), we used a similar behavioral task in which workload or delay to obtain a reward was manipulated. Based on our data, D1R and D2R have similar roles in incentive-based motivation, whereas D2R is exclusively related to effort-based motivation.

Results
PET measurement of D1R/D2R occupancy following systemic antagonist administration. To establish appropriate antagonist dose setting and experimental timing, we measured the degree of receptor blockade (i.e., receptor occupancy) following systemic administration of DAR antagonists. We performed PET imaging with selective radioligands for D1R ([ 11 C]SCH23390) and D2R ([ 11 C]raclopride) to measure specific radioligand binding in the brain for both baseline (without drug administration) and following antagonist administration. Receptor occupancy was estimated as the degree of reduction of specific binding (S1 Fig, see methods) [31].
For D1R measurement, high radiotracer binding was seen in the striatum at baseline condition (Fig 1A, Baseline). We used non-radiolabeled SCH23390 for D1R antagonist at different doses (10,30,50, and 100 µg/kg). The striatal binding was diminished by pre-treatment with systemic administration of SCH23390 in a dose-dependent manner ( Fig 1A). In 3 monkeys, we measured the relationship between D1R occupancy and the dose of SCH23390, which was approximated by a Hill function (Fig 1C; Eq. 4). We found that treatment with SCH23390 at a dose of 100 and 30 µg/kg corresponded to 81% and 57% of D1R occupancy, respectively.
Haloperidol was used for D2R antagonism. Unlike SCH23390, which was rapidly washed from the brain within a few hours, a single dose of haloperidol treatment was expected to show persistent D2R occupancy for the following several days as described in humans and mice [32,33], providing the opportunity for testing different occupancy conditions. The baseline [ 11 C]raclopride PET image showed the  highest radiotracer binding in the striatum (Fig 1B, Baseline). As expected, striatal binding was diminished not only just after pre-treatment with haloperidol (10 µg/kg, i.m.), but also until post-haloperidol day 2 ( Fig 1B, Day 2). Binding had recovered to the baseline level by day 7 (Fig 1B, Day 7). We measured D2R occupancy on days 0, 1, 2, 3, and 7 after a single haloperidol injection in 3 monkeys. An exponential decay function approximated the relationship between D2R occupancy and post-haloperidol days (Eq. 5); a single injection of haloperidol yielded 78% and 48% of D2R occupancy on days 0 and 1, respectively ( Fig 1D).

Effects of D1R-and D2R-blockade on incentive motivation.
To assess the effect of blockade of D1R and D2R on incentive motivation, we tested 2 monkeys with a reward-size task (Fig 2A). In every trial of this task, the monkeys were required to release a bar when a visual target changed from red to green to get a liquid reward. A visual cue indicated the amount of reward (1, 2, 4, or 8 drops) at the beginning of each trial (Fig 2A). After a few months of training, the monkeys were able to release the bar in response to the Go signal. However, they never performed perfectly, and failures consisted of either releasing the bar too early or too late. These failures were usually observed in small reward trials or close to the end of daily sessions. As in previous experiments using a single option presentation which monkeys can perform correctly or not, failures were regarded as trials in which the monkeys are not sufficiently motivated to correctly release the bar (i.e., refusal) [2]. Hence, the frequency of refusal  Table).
trials can be used as a behavioral measure of motivation [8,[34][35][36][37]. Besides, we have shown that the refusal rate (E) is inversely related to reward size (R), which has been formulated with a single free parameter a [2] (Fig 2B), In agreement with these previous studies, both monkeys exhibited the inverse relationship in non-treatment condition ( Fig 2D and 2E, Control).
We found that DAR blockade decreases incentive motivation, leading to an increase in refusal rate of the task. For example, D1R blockade (systemic injection of SCH23390) increased the refusal rates particularly in smaller reward size trials ( Fig  2D;  . We considered whether this increase was due to a reduction in the incentive impact of reward, or a decrease in motivation irrespective of reward size. These factors can be captured by a decrease in parameter a of the inverse function and implementing intercept e, respectively ( Fig 2C). To quantify the increases in refusal rate, we compared 4 models considering these two factors as random effects. For both monkeys, the increases in refusal rate were explained by a decrease in the parameter a due to the treatment, while the inverse relation with reward size was maintained (model #3 for monkey KN and model #1 for ST; S1 Table). We then assessed changes in parameter a, which indicates the incentive impact of reward size. As shown Figure 3A, normalized a became smaller as the dose of SCH23390 was increased to 30 or 50 µg/kg, but then it increased at the highest dose (100 µg/kg) (Fig 3A, left).
Thus, incentive impact did not decrease monotonically with the dose, but changed in a U-shaped manner in both monkeys.
For the D2R blockade, the monkeys were tested with the task 15 min after a single injection of haloperidol (10 µg/kg, i.m., day 0) and were then successively tested on the following

D R A F T
days 1, 2, 3, 4 and 7. We also found a significant increase in refusal rates for D2R blockades in both monkeys ( Fig  2E). The refusal rates were highest on the day of haloperidol injection, after which they decreased as the days went by (2-way ANOVA, treatment, F(6, 27) = 9.6, p < 0.001). The increases in refusal rate were reward size-dependent (reward size, F(3, 27) = 186, p < 0.001; treatment × reward size, F(18, 27) = 3.7, p < 0.01). Similar to the D1R blockade, the increases in refusal rate due to D2R blockade were explained solely by a decrease of parameter a according to the days following the treatment for both monkeys (model #1 for both monkeys KN and ST; S1 Table). Our model-based analysis revealed that a decreased about 40% on the day of haloperidol injection and the following 3 days as compared to control, and then recovered to almost the control level on day 7 ( Fig 3B).
To compare the effects between D1R and D2R blockades directly, we plotted changes in incentive impact along with the degree of blockage ( Fig 3C). In both D1R and D2R blockades, a declined according to the increase in occupancy; it gradually declined as D1R occupancy increased, but then increased at the highest occupancy, whereas it steeply declined until 20% D2R occupancy, and then continued to decrease slightly until 80% occupancy (Fig 3C). At 20 -80% occupancy, the incentive impacts for D2R blockade stayed lower than those for D1R, suggesting a stronger sensitivity of incentive impact to D2R blockade.
We sought to verify that the effect of D2R antagonism was not specific for haloperidol and to validate the comparison between D1R and D2R in terms of receptor occupancy. We examined the behavioral effect of another D2R antagonist, raclopride, at a dose yielding about 50% receptor occupancy (10 µg/kg, i.m.; S2 Fig). Following this dose of raclopride administration, a monkey again exhibited increased refusal rates, which was explained by inverse function with a = 5.2 (drop -1 ), a comparative value of incentive impact observed at 50% D2R occupancy with haloperidol [a = 5.4 (drop -1 ), day 1; S2B Fig]. Thus, our data suggest that D2R antagonisminduced reduction of the incentive effect seems to reflect the degree of receptor blockade regardless of the antagonist used.

Effects of D1R-and D2R-blockade on response speed.
Previous studies have reported that systemic administration of D1R or D2R antagonists increases reaction times (RTs) in monkeys (e.g., [38]). Consistent with those studies, DAR blockade in our study prolonged RTs in a treatmentdependent manner. For D1R blockade, RTs were increased according to the antagonist dose (S3A and S3B Fig These results suggest that D1R blockade also influences response speed, probably due to slowing cognitive processing as implicated in previous studies [39,40]. Thus, the effect of D1 manipulation on value-based decision was somehow related to its effects on the action itself. Little influence of D1R-or D2R-blockade on internal drive or relative reward value. The behavioral data shown above suggest that blockade of DAR attenuates the incentive effect of reward on behavior. However, two important questions remain: (1) Whether the relative reward value is unaffected in general? (2) Whether the internal drive is unaffected? Previous studies in rodents showed that DA antagonism does not alter water consumption or preference for sucrose over water in rats [15,41]. We confirmed this in primates by examining the effect of DAR blockade on water intake and sucrose preference in 2 monkeys. As expected, treatment with D1R or D2R antagonist did not affect overall intake (one-way ANOVA, treatment, F(2, 25) = 0.06, p = 0.93) or sucrose preference (treatment, F(2, 18) = 0.70, p = 0.51; S4A Fig). We also assessed blood osmolality, a physiological index of dehydration and thirst drive [42], before and after the preference test. Again, DAR treatment had no significant influence on overall osmolality or recovery of osmolality (rehydration) (2-way ANOVA, main effect of treatment, F(2, 35) = 0.08, p = 0.92; treatment × pre-post, F(2, 35) = 0.15, p = 0.87; S4B Fig). These results suggest that DAR blockade has no influence on physiological needs or relative reward value. These results also support the notion that the increased refusal rate was not directly due to a reduction of thirst drive. Differential effects of D1R and D2R blockades on workload and delay discounting. Next, we assessed the effect of selective DAR blockade on cost-based motivation. For this purpose, we used a work/delay task (Fig 4A), where the basic features were the same as those in reward-size task. There were two trial types. In the work trials, the monkeys had to perform 0, 1, or 2 additional instrumental trials to obtain a reward. In the delay trials, after the monkeys correctly performed one instrumental trial, a reward was delivered 0-7 seconds later. The number of trials or length of delay was indicated by a visual cue presented throughout the trial. In the first trial after the reward, the visual cue informed how D R A F T much would need to be paid in order to get the next reward. Therefore, we assessed the performance of the monkeys on the first trials to evaluate the impact of expected cost on motivation and decision-making. We showed that the monkeys exhibited linear relationships between refusal rate (E) and remaining costs (CU) for both work and delay trials, as follows: where k is a coefficient and E 0 is an intercept [43] (Fig 4B). By extending the inference and formulation of reward-size task (Eq. 1), this linear effect proposes that the reward value is hyperbolically discounted by cost, where the coefficient k corresponds to discounting factors. Consistently, refusal rates of control condition increased as the remaining cost increased (e.g., Fig 4C, control; 2-way ANOVA, cost type × remaining cost, main effect of remaining cost, F(2, 46) = 109, p < 10 -15 ). Figure 4B illustrates our hypothesis that DAR blockade increases cost sensitivity (i.e, discounting factor, k), leading to an increase in refusal rate of the task.  To compare the behavioral effect of D1R vs D2R antagonisms at the same degree of receptor blockade, we assessed the performance of the monkeys under two comparable levels of DAR occupancy for D1R and D2R, moderate occupancy (MO,~50%) and high occupancy (HO,~80%), and under baseline condition (non-treatment) as a control. D1R blockades selectively increased the refusal rates in delay trials in an occupancy-dependent manner (3-way ANOVA, occupancy × cost type, F(2, 142) = 5.2, p < 0.01; Fig 4C). By contrast, D2R blockade preferentially increased the refusal rates in work trials (occupancy × cost type, F(2, 142) = 25, p < 10 -9 ; Fig 4D). Like our previous study [43], two linear regression models (Eqs. 6 and 7, see methods) simultaneously fitted the data well in all cases (average R 2 > 0.9), allowing us to measure the effects of DAR as the increased steepness of cost-discounting of motivational value. We found that workload-discounting (kw) was specifically increased by D2R blockade in an occupancy-dependent manner (2-way ANOVA, receptor subtype × occupancy, F(2, 10) = 14.1, p < 0.01; Fig 4E). Delay-discounting, on the other hand, was inclined to increase according to the degree of DAR blockade irrespective of receptor subtype (main effect of occupancy, F(2, 10) = 4.0, p = 0.054; receptor subtype × occupancy, F(2, 10) = 0.3, p = 0.74; Fig 4F).
Considering the direct and indirect striatal output pathways where neurons exclusively express D1R and D2R, respectively, and the opposition believed to exist between the pathways in general, it should be possible to counterbalance the effects of those antagonists with each other [44]. To test this possibility, we examined the behavioral effects of both D1R and D2R blockades at the same occupancy level. After treatment with both SCH23390 (100 µg/kg) and haloperidol (10 µg/kg), seemingly achieving~80% of occupancy for both subtypes (cf. Fig 1C and 1D), all monkeys stopped performing the task with a small number of correct trials (1-13% of control). When we treated the monkeys with SCH23390 (30 µg/kg) on the day following that of haloperidol injection (i.e., both D1R and D2R assumed to be occupied at~50%), the monkeys had higher refusal rates in delay trials than control (Fig 5A, D1R+D2R block) and displayed a higher discounting factor (Fig 5B, delay). By contrast, this simultaneous D1R and D2R blockade appeared to attenuate the effect of D2R antagonism on workload in 2 of 3 monkeys; the refusal D R A F T rates in work trials were not as high as in D2R blockade alone (Fig 5A), and the workload-discounting factor (kw) became the value between that for D1 and D2 antagonisms (Fig 5B,  workload). A similar counterbalance was also seen in the relative strength of discounting (ratio of kw/kd) as well as the motivation for the minimum cost trials (E 0 ) (Fig 5B). These results suggest that blocking both receptor subtypes tends to induce a synergistic effect on delay-discounting, while it compensates the effects on workload-discounting.

Fig. 5. Effect of both D1R and D2R blockades on cost evaluation for motivation. (A) Representative relationship between refusal rates (in monkey KN; mean ± SEM) and remaining costs for workload (green) and delay trials (black). (B)
Best-fit parameters, workload-discounting (kw), delay-discounting (kd), workload/delay ratio (kw/kd), and intercept (E0), are plotted for each treatment condition. Bars and symbols indicate mean and individual data, respectively. D1R+D2R block indicates the data obtained under both D1R and D2R blockades at moderate occupancy, while D1R and D2R blockades at high occupancy resulted in almost no correct performance (see text). All parameters are derived from the best fit for Eqs. 6 and 7, respectively.

Discussion
Combining the PET occupancy study and pharmacological manipulation of D1-and D2-like receptors with quantitative measurement of motivation in monkeys, the current study demonstrated dissociable roles of the DA transmissions via D1R and D2R in the computation of the cost/benefits tradeoff to guide action. To the best of our knowledge, this is the first study to directly compare the contribution of dopamine D1R and D2R along with the degree of receptor blockade. Using model-based analysis, we showed that DAR blockade had a clear quantitative effect on the sensitivity of animals to information about potential costs and benefits, without any qualitative effect on the way monkeys integrated costs and benefits and adjusted their behavior. We showed that blockade of D1R or D2R reduced the incentive impact of reward as the degree of DAR blockade increased, and the incentive impact was more sensitive to the D2R blockade than the D1R blockade at lower occupancy. In cost-discounting experiments, we could dissociate the relation between each DAR type and workload vs delay-discounting: workloaddiscounting was increased exclusively by D2R antagonism, whereas delay-discounting was increased by DAR blockade irrespective of receptor subtype. When both D1R and D2R were blocked simultaneously, the effects were synergistic and strengthened for delay-discounting, while the effects were antagonistic and diminished for workload-discounting.

DA controls the incentive effect of expected reward amount.
Previous pharmacological studies have shown that DAR blockade decreased the speed of action and/or probability of engagement behavior [20,21]. However, the previous studies did not address the quantitative effect of DAR blockade on incentive motivation; more specifically, there was a lack of experimental data to model the causal relationship among DAR stimulation, reward, and motivation. In the present study, we used a behavioral paradigm that enabled us to formulate and quantify the relationship between reward and motivation [2] (Fig 2). Our finding, a reduction of incentive impact due to DAR antagonism (cf., Fig 3) is in line with the incentive salience theory, that is, DA transmission attributes salience to incentive cue to promote goal-directed action [12]. The lack of effect of DA manipulation on satiety and spontaneous water consumption are compatible with the idea that DA manipulation has a stronger effect on incentive processes (influence of reward on action) than on hedonic processes (evaluation itself, pleasure associated with consuming reward), but further experiments would be necessary to address that point directly [12].
Our model-based analysis indicates that DAR blockade only had a quantitative influence (a reduction of incentive impact of reward) without changing the qualitative relationship between reward size and behavior. This is in marked contrast with the reported effects of inactivation of brain areas receiving massive DA inputs, including the orbitofrontal cortex, rostromedial caudate nucleus, and ventral pallidum. Indeed, in experiments using nearly identical tasks and analysis, inactivation or ablation of these regions produced a qualitative change in the relationship between reward size and behavior (more specifically, a violation of the inverse relationship between reward size and refusal rates) [36,37,45]. Thus, the influence of DAR cannot be understood as a simple permissive or activating effect on target regions. The specificity of the DAR functional role is further supported by the subtle, but significant difference between the behavioral consequences of blocking of D1R vs D2R. By combining a direct measure of DAR occupancy and quantitative behavioral assessment, the present study demonstrates that the incentive impact of reward is more sensitive to D2R blockade than D1R blockade, and especially at a lower degree of occupancy (cf. Fig 3C). Moreover, the dose-response relation between occupancy and behavior was monotonous for D2R, but U-shaped for D1R. Although this might be surprising, such non-monotonic effects have been repeatedly reported. For example, working memory performance and related neural activity in the prefrontal cortex takes the form of an "inverted-U" shaped curve, where too little or too much D1R activation impairs cognitive performance [3,46,47]. As for the mechanisms underlying the distinct functional relation between the behavioral effects of D1R vs D2R blockade, it is tempting to speculate D R A F T that this is related to a difference in their distribution, their affinity and the resulting relation with phasic vs tonic DA action. Indeed, DA affinity for D2R is~100 times higher than that for D1R [48]. This is directly in line with the higher behavioral sensitivity of D2R manipulation, compared to that of D1R. Moreover, in the striatum, a basal DA concentration of~5-10 nM is sufficient to constantly stimulate D2R. Using available biological data, a recent simulation study showed that the striatal DA concentration produced by the tonic activity of DA neurons (~40 nM) would occupy 75% of D2R but only 3.5% of D1R [49]. Thus, blockade of D2R at low occupancy may interfere with tonic DA signaling, whereas D1R occupancy would only be related to phasic DA action, i.e., when transient but massive DA release occurs (e.g., in response to critical information about reward). We acknowledge that this remains very hypothetical, but irrespective of the underlying mechanisms, our data clearly support the idea that DA action on D1R vs D2R exerts distinct actions on their multiple targets to enhance incentive motivation.

DA transmission via D1R and D2R distinctively controls cost-based motivational process.
Although many rodent studies have demonstrated that attenuation of DA transmission alters not only benefit-but also cost-related decision-making, the exact contribution of D1R and D2R remains elusive. For example, reduced willingness to exert physical effort to receive higher reward was similarly found following D1 and D2 antagonisms in some studies, while it was observed exclusively by D2 antagonism in other studies [22,50,51]. This inconsistency may arise because previous studies usually investigated the effect of antagonism on D1R and D2R along with a relative pharmacological concentration (e.g., low and high doses). In the present study, PET-assessed DAR manipulation allowed us to directly compare the behavioral effect between D1R and D2R with an objective reference, namely occupancy (i.e., 50% and~80% occupancy). Besides, the exact nature of the cost (effort vs delay) has sometimes been difficult to identify, and effort manipulation is often strongly correlated with reward manipulation (typically when the amount of reward earned is instrumentally related to the amount of effort exerted, see [9]). Here, using a task manipulating forthcoming workload independently from reward value, we demonstrated that blockade of D2R, but not D1R, increased workload-discounting in an occupancy-dependent manner while maintaining linearity (cf., Fig 4E). In addition, D1R and D2R had synergistic effects in the delay-discounting tasks but antagonistic effects in the workload-discounting task, which also indicates that the DAR contribution to delayvs workload-discounting is qualitatively different. Thus, even if workload trials also include a delay component in our task, the distinct effects of DAR manipulations confirm that the nature of the cost in the workload and delay trials differs, at least from a neurobiological point of view [43]. Thus, these results extend previous studies demonstrating increased effort-discounting by D2R blockade [23,52] and support the notion that DA activation allows overcoming effort costs through a mechanism that can be distinguished from that of incentive motivation, which involves both D1R and D2R.
Delay-discounting and impulsivity -the tendency associated with excessive delay-discounting -are also thought to be linked to the DA system [53,54]. Systemic administration of D1R or D2R antagonist increases preference for immediate small rewards, rather than larger and delayed rewards [23,25,55,56]. Concurrently, some of these studies also showed negative effects of D1R [25] or D2R blockade [56] on impulsivity. These inconsistencies may be attributed to the differences in behavioral paradigms or drugs (and doses) used. Our PET-assessed DAR manipulation demonstrated that blockade of D1R and D2R at the same occupancy level (~50% and~80%) similarly increased delay-discounting ( Fig  4F), suggesting that DA transmission continuously adjusts delay-discounting at the post-synaptic site. This observation is in good accord with the previous finding that increasing DA transmission decreases temporal discounting; e.g., amphetamine or methylphenidate increased the tendency to choose long-delays options for larger rewards [25,[55][56][57][58].
In contrast with workload-discounting, however, the relation with DAR in delay-discounting and incentive-motivation could not be distinguished, in that both D1R and D2R might be involved. This is reminiscent of neurophysiological data, revealing that DA neurons show a strong sensitivity to both reward and delay, but a weaker sensitivity to effort [8,59,60]. Altogether, this is in line with the notion that the DA system does not process upcoming benefits (information about potential benefits, including their distribution in space and time) in the same way it processes upcoming costs (here defined as energy expenditure) [9]. This differential relation between DA and delay vs workload might be related to the differential expression of these receptors in the direct vs indirect striatopallidal pathway, where the striatal neurons exclusively express D1R and D2R, respectively [61]. Opposing functions between these pathways have been proposed: activity of the direct pathway (D1R) neurons reflects positive rewarding events promoting movement, whereas activity of the indirect pathway (D2R) neurons is related to negative values mediating aversion or inhibiting movements [44,62] (but see [63]). DA increases the excitability of direct-pathway neurons, and this effect was reduced by D1R antagonism, decreasing motor output. DA reduces the responsiveness of indirect pathway neurons via D2R [61], and blockade of D2R would increase the activity, reducing motor output via decreased thalamocortical drive [64]. This scenario may explain our finding of a synergistic effect of simultaneous D1R and D2R blockade on delay-discounting (cf. Fig 5). Further work would be necessary to clarify this hypothesis, including the dynamic relation with tonic vs phasic DA release, but altogether, these data strongly support the idea that the distinct contribution of the DA system to benefits (reward availability) and costs (energy expenditure) involves a complementary action of the direct and indirect pathways.

D R A F T
Limitations of the current study. Finally, the limitations of the current study and areas for further research can be discussed. First, because of applying systemic antagonist administration, the current study could not determine which brain area(s) is responsible for antagonist-induced alterations of benefit-and cost-based motivation. While our data support the idea that differential neural networks involve workloadand delay-discounting, further study (e.g., local infusion of DA antagonist) is needed to identify the locus of the effects, generalizing our findings to unravel the circuit and molecular mechanism of motivation. We should also note that the current study does not address dynamic learning paradigms and therefore does not generalize our findings to the function of the DA system in learning directly. Despite these limitations, the current study provides unique insights into the role of the DA system in the motivational process.

Conclusions
In summary, the present study demonstrates an apparent dissociation of the functional role of DA transmission via D1and D2-like receptors in benefit-and cost-based motivational processing. DA transmissions via D1R and D2R modulate both the incentive impact of reward size and the negative influence of delay. By contrast, workload-discounting is regulated exclusively via D2R. In addition, D1R and D2R had synergistic effects on delay-discounting but opposite effects on workload-discounting. These dissociations can be attributed to differential involvement of the direct and indirect striatofugal pathways in workload-and delay-discounting. Together, our findings add an important aspect to our current knowledge concerning the role of DA signaling motivation based on the trade-off between costs and benefits, thus providing an advanced framework for understanding the pathophysiology of psychiatric disorders.

Materials and Methods
Ethics statement. All surgical and experimental procedures were approved by the Animal Care and Use Committee of the National Institutes for Quantum and Radiological Science and Technology (#09-1035), and were in accordance with the Institute of Laboratory Animal Research Guide for the Care and Use of Laboratory Animals.

Subjects.
A total of nine male adult macaque monkeys (8 Rhesus and 1 Japanese; 4.6-7.7 kg) were used in this study. Food was available ad libitum, and motivation was controlled by restricting access to fluid to experimental sessions, when water was delivered as a reward for performing the task. Animals received water supplementation whenever necessary (e.g., if they could not obtain enough water through experiments), and they had free access to water whenever testing was interrupted for more than a week.

Drug treatment.
All experiments in this study were carried out with injected intramuscular (i.m.) SCH23390 (Sigma-Aldrich), haloperidol (Dainippon Sumitomo Pharma, Japan), and raclopride (Sigma-Aldrich) dissolved or diluted in 0.9% saline solution. Animals were pretreated with an injection of SCH23390 (10, 30, 50, or 100 µg/kg), haloperidol (10 µg/kg), or raclopride (10 or 30 µg/kg) 15 min before the beginning of the behavioral testing or PET scan. In behavioral testing, saline was injected as a vehicle control by the same procedure as drug treatment. The administered volume was 1 mL across all experiments with each monkey.
PET procedure and occupancy measurement. Four monkeys were used in the measurement. PET measurements were performed with two PET ligands: [ 11 C]SCH23390 (for studying D1R binding) and [ 11 C]raclopride (for studying D2R binding). The injected radioactivities of [ 11 C]SCH23390 and [ 11 C]raclopride were 91.7 ± 6.0 MBq (mean ± SD) and 87.0 ± 4.9 MBq, respectively. Specific radioactivities of [ 11 C]SCH23390 and [ 11 C]raclopride at the time of injection were 86.2 ± 40.6 GBq/µmol and 138.2 ± 70.1 GBq/µmol, respectively. All PET scans were performed using an SHR-7700 PET scanner (Hamamatsu Photonics Inc., Japan) under conscious conditions and seated in a chair. Prior to the PET study, the monkeys underwent surgery to implant a head-hold device using aseptic techniques [65]. After transmission scans for attenuation correction using a 68 Ge-68 Ga source, a dynamic scan in three-dimensional (3D) acquisition mode was performed for 60 min ([ 11 C]SCH23390) or 90 min ([ 11 C]raclopride). The ligands were injected via crural vein as a single bolus at the start of the scan. All emission data were reconstructed with a 4.0-mm Colsher filter. Tissue radioactive concentrations were obtained from volumes of interest (VOIs) placed on several brain regions where DARs are relatively abundant: caudate nucleus, putamen, nucleus accumbens (NAcc), thalamus, hippocampus, amygdala, parietal cortex, principal sulcus (PS), dorsolateral prefrontal cortex (dlPFC), and ventrolateral prefrontal cortex (vlPFC), as well as the cerebellum (as reference region). Each VOI was defined on individual T1-weighted axial magnetic resonance (MR) images (EXCELART/VG Pianissimo at 1.0 tesla, Toshiba, Japan) that were co-registered with PET images using PMOD® image analysis software (PMOD Technologies Ltd, Switzerland). Regional radioactivity of each VOI was calculated for each frame and plotted against time. Regional binding potentials relative to non-displaceable radioligands (BP ND ) of D1R and D2R were estimated with a simplified reference tissue model on VOI and voxel-by-voxel bases [66][67][68]. The monkeys were scanned with and without drug-treatment condition on different days. Occupancy levels were determined from the degree of reduction (%) of BP ND by antagonists [69]. DA receptor occupancy was estimated as follows: where BP ND Baseline and BP ND Treatment are BP ND measured without (baseline) and with an antagonist, respectively. Relationship between D1R occupancy (D1 Occ) and dose of SCH23390 (Dose) was estimated with 50% effective dose (ED 50 ) as follows: Relationship between D2R occupancy (D2Occ) and days after haloperidol injection was estimated using the level at day 0 with a decay constant (λ) as follows: Behavioral tasks and testing procedures. Three monkeys (ST, 6.4 kg; KN, 6.3 kg; M7, 7.3 kg) were used for the behavioral study. For all behavioral training and testing, each monkey sat in a primate chair inside a sound-attenuated dark room. Visual stimuli were presented on a computer video monitor in front of the monkey. Behavioral control and data acquisition were performed using the REX program. Neurobehavioral Systems Presentation software was used to display visual stimuli (Neurobehavioral Systems). We used two types of behavioral tasks, reward-size task and work/delay task, as described previously [2,43]. Both tasks consisted of color discrimination trials (see Figs 2A and 4A). Each trial began when the monkey touched a bar mounted at the front of the chair. The monkey was required to release the bar between 200 and 1,000 ms after a red spot (wait signal) turned green (go signal). On correctly performed trials, the spot then turned blue (correct signal). A visual cue was presented at the beginning of each color discrimination trial (500 ms before the red spot appearing). In the reward-size task, a reward of 1, 2, 4, or 8 drops of water (1 drop =~0.1 mL) was delivered immediately after the blue signal. Each reward size was selected randomly with equal probability. The visual cue presented at the beginning of the trial indicated the number of drops for the reward (Fig 2A). In the work/delay task, a water reward (~0.25 mL) was delivered after each correct signal immediately or after an additional 1 or 2 instrumental trials (work trial), or after a delay period (delay trials). The visual cue indicated the combination of the trial type and requirement to obtain a reward (Fig 4A). Pattern cues indicated the delay trials with the timing of reward delivery after a correct performance: either immediately (0.3 s, 0.2 -0.4 s; mean, range), a short delay (3.6 s, 3.0 -4.2 s), or a long D R A F T delay (7.2 s, 6.0 -8.4 s). Grayscale cues indicated work trials with the number of trials the monkey would have to perform to obtain a reward. We set the delay durations to be equivalent to the duration for 1 or 2 trials of color discrimination trials, so that we could directly compare the cost of 1 or 2 arbitrary units (cost unit; CU).
If the monkey released the bar before the green target appeared or within 200 ms after the green target appeared or failed to respond within 1 s after the green target appeared, we regarded the trial as a "refusal trial"; all visual stimuli disappeared, the trial was terminated immediately, and after the 1-s inter-trial interval, the trial was repeated. Our behavioral measurement for the motivational value of outcome was the proportion of refusal trials. Before each testing session, the monkeys were subject to~22 hours of water restriction in their home cage. Each session continued until the monkey would no longer initiate a new trial (usually less than 100 min).
Before this experiment, all monkeys had been trained to perform color discrimination trials in the cued multi-trial reward schedule task for more than 3 months. The monkeys were tested with the work/delay task for 1-2 daily sessions as training to become familiar with the cueing condition. Each monkey was tested from Monday to Friday. Treatment with SCH23390 was performed every four or five days. On other days without SCH23390, sessions with saline (1 mL) treatment were analyzed as control sessions. Haloperidol was given every two or three weeks on Monday or Tuesday, because D2R occupancy persisted for several days after a single dose of haloperidol treatment ( Fig 1D). The days before haloperidol treatment were analyzed as control sessions. Each dose of SCH23390 or a single dose of haloperidol was tested 4 or 5 times per each animal.
Sucrose preference test. Two monkeys (RO, 5.8kg; KY, 5.6kg) were used for the sucrose preference test. The test was performed in their home cages once a week. In advance of the test, water access was prevented for 22 h. The monkeys were injected with SCH23390 (30 µg/kg), haloperidol (10 µg/kg), or saline 15 min before the sucrose preference test. Two bottles containing either 1.5% sucrose solution or tap water were set into bottle holders in the home cage and the monkeys were allowed to freely consume fluids for 2h. The total amount of sucrose (SW) and tap water (TW) intake was measured and calculated as sucrose preference index (SP) as follows: SP = (SW -TW) / (SW + TW). The position of sucrose and tap water bottles (right or left toward the front panel of the home cage) was counterbalanced across sessions and monkeys. Drugs or saline was injected alternatively once a week. We also measured the osmolality level in blood samples (1 mL) obtained immediately before and after each testing session.
Behavioral data analysis. All data and statistical analyses were performed using the R statistical computing environment (R Development Core Team, 2004). The average error rate for each trial type was calculated for each daily session, with the error rates in each trial type being defined as the number of error trials divided by the total number of trials of that given type. The monkeys sometimes made many errors at the beginning of the daily session, probably due to high motivation/impatience; we excluded the data until the 1st successful trial in these cases. A trial was considered an error trial if the monkey released the bar either before or within 200 ms after the appearance of the green target (early release) or failed to respond within 1 s after the green target (late release). We did not distinguish between the two types of errors and used their sum except for the error pattern analysis. We performed repeated-measures ANOVAs to test the effect of treatment × reward size (for the data in reward-size task) or treatment × cost type × remaining cost (for the data in work/delay task) on error rate, on late release rate (i.e., error pattern), on reaction time, and on movements during the delay.
We used the refusal rates to estimate the level of motivation because the refusal rates of these tasks (E) are inversely related to the value for action [2]. In the reward-size task, we used the inverse function (Eq. 1). We fitted the data to linear mixed models [70], in which the random effects across DAR blockade conditions on parameter a and/or intercept e (Fig 2C) were nested. Model selection was based on Akaike's information criterion (AIC), an estimator of in-sample prediction error for the nested models (S1 Table). Using the selected model, the parameter a was estimated individually, and then normalized by the value in non-treated condition (CON) (Fig 3A and  3B). In the work/delay task, we used linear models to estimate the effect of remaining cost, i.e., workloads and delay, as described previously [43], where Ew and E d are the error rates, and kw and k d are cost factors for work and delay trials, respectively. CU is the number of remaining cost units, and E 0 is the intercept. We simultaneously fitted a pair of these linear models to the data by sum-of-squares minimization without weighting. The coefficient of determination (R 2 ) was reported as a measure of goodness of fit.

D R A F T D R A F T D R A F T
Supplementary Information S1 Table. Model comparison. a(cond) and e(cond) indicate the random effects of DAR blocking treatment conditions on parameters a and e, respectively. AIC (Akaike's information criterion) is a relative measure of quality for the models (#1-4). ΔAIC denotes difference from minimum AIC. Occupancy was determined as a proportion of reduced specific binding to baseline, which corresponds to the slope of linear regression. In this case, D1 occupancy was 80%, 78%, 67%, and 26% for 100, 50, 30 and 10 µg/kg doses, respectively.

S2 Fig.
Comparable effects of D2R antagonism between raclopride and haloperidol at similar occupancy. (A) Occupancy of D2R measured at striatal ROI is plotted against dose of raclopride. (B) Error rates as a function of reward size for control (black) and after injection of raclopride (10 µg/kg, i.m, left side) and haloperidol (10 µg/kg, i.m, right side) in monkey KN are plotted. Dotted curves are best-fit inverse function (model #1 in S1 Table).