Schultz, W., Stauffer, W. R. & Lak, A. The phasic dopamine signal maturing: from reward via behavioural activation to formal economic utility. Curr. Opin. Neurobiol. 43, 139–148 (2017).
Glimcher, P. W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl Acad. Sci. USA 108, 15647–15654 (2011).
Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).
Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H. & Tanaka, T. Parametric return density estimation for reinforcement learning. In Proc. 26th Conference on Uncertainty in Artificial Intelligence (eds Grunwald, P. & Spirtes, P.) http://dl.acm.org/citation.cfm?id=3023549.3023592 (2010).
Bellemare, M. G., Dabney, W., & Munos, R. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (eds Precup, D. & The, Y. W.) 449–458 (2017).
Dabney, W. Rowland, M. Bellemare, M. G. & Munos, R. Distributional reinforcement learning with quantile regression. In AAAI Conference on Artificial Intelligence (2018).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction Vol. 1 (MIT Press, 1998).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Hessel, M. et al. Rainbow: combining improvements in deep reinforcement learning. In 32nd AAAI Conference on Artificial Intelligence (2018).
Botvinick, M. M., Niv, Y. & Barto, A. G. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).
Song, H. F., Yang, G. R. & Wang, X. J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6, e21492 (2017).
Barth-Maron, G. et al. Distributed distributional deterministic policy gradients. In International Conference on Learning Representations https://openreview.net/forum?id=SyZipzbCb (2018).
Dabney, W., Ostrovski, G., Silver, D. & Munos, R. Implicit quantile networks for distributional reinforcement learning. In International Conference on Machine Learning (2018).
Pouget, A., Beck, J. M., Ma, W. J. & Latham, P. E. Probabilistic brains: knowns and unknowns. Nat. Neurosci. 16, 1170–1178 (2013).
Lammel, S., Lim, B. K. & Malenka, R. C. Reward and aversion in a heterogeneous midbrain dopamine system. Neuropharmacology 76, 351–359 (2014).
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).
Rowland, M., et al. Statistics and samples in distributional reinforcement learning. In International Conference on Machine Learning (2019).
Frank, M. J., Seeberger, L. C. & O’Reilly, R. C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004).
Hirvonen, J. et al. Striatal dopamine D1 and D2 receptor balance in twins at increased genetic risk for schizophrenia. Psychiatry Res. Neuroimaging 146, 13–20 (2006).
Piggott, M. A. et al. Dopaminergic activities in the human striatum: rostrocaudal gradients of uptake sites and of D1 and D2 but not of D3 receptor binding or dopamine. Neuroscience 90, 433–445 (1999).
Rosa-Neto, P., Doudet, D. J. & Cumming, P. Gradients of dopamine D1- and D2/3-binding sites in the basal ganglia of pig and monkey measured by PET. Neuroimage 22, 1076–1083 (2004).
Mikhael, J. G. & Bogacz, R. Learning reward uncertainty in the basal ganglia. PLOS Comput. Biol. 12, e1005062 (2016).
Robb, B. et al. A computational and neural model of momentary subjective well-being. Proc. Natl Acad. Sci. USA 111, 12252–12257 (2014).
Huys, Q. J., Daw, N. D. & Dayan, P. Depression: a decision-theoretic analysis. Annu. Rev. Neurosci. 38, 1–23 (2015).
Bennett, D. & Niv, Y. Opening Burton’s clock: psychiatric insights from computational cognitive models. Preprint at https://doi.org/10.31234/osf.io/y2vzu (2018).
Tian, J. & Uchida, N. Habenula lesions reveal that multiple mechanisms underlie dopamine prediction errors. Neuron 87, 1304–1316 (2015).
Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).
Newey, W. K. & Powell, J. L. Asymmetric least squares estimation and testing. Econometrica 55, 819–847 (1987).
Chris Jones, M. Expectiles and m-quantiles are quantiles. Stat. Probab. Lett. 20, 149–153 (1994).
Ziegel, J. F. Coherence and elicitability. Math. Finance 26, 901–918 (2016).
Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).
Heess, N. et al. Emergence of locomotion behaviours in rich environments. Preprint at https://arxiv.org/abs/1707.02286 (2017).
Bäckman, C. M., et al. Characterization of a mouse strain expressing cre recombinase from the 3′ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006).
Cohen, J. Y. et al. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Stauffer, W. R., Lak, A. & Schultz, W. Dopamine reward prediction error responses reflect marginal utility. Curr. Biol. 24, 2491–2500 (2014).
Fiorillo, C. D., Song, M. R. & Yun, S. R. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli. J. Neurosci. 33, 4710–4725 (2013).
Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized experience replay. In International Conference on Learning Representations (2016).
Van Hasselt, H., Guez, A. & Silver, D. Deep reinforcement learning with double q-learning. In AAAI Conference on Artificial Intelligence (2016).
Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images (Univ. of Toronto, 2009).