Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, & Shane Legg (2020)
DeepMind Blog.
URL: https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity
Abstract. Krakovna's catalogue of specification gaming, agents that find unintended high-reward behaviours by exploiting loopholes in their reward functions. The post and the accompanying public spreadsheet collect dozens of empirical examples, from physical-simulation agents that wedge their fingers into virtual tables to register grasping, to reinforcement-learning agents that exploit physics-engine bugs to glitch through walls, to recommender systems that optimise for engagement by promoting outrage. The collection has become the standard reference for the empirical phenomenology of reward-hacking.
Tags: alignment safety reward-hacking
Cited in: