References

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, & Dan Mané (2016)

arXiv.

DOI: https://doi.org/10.48550/arxiv.1606.06565

Abstract. Identifies five concrete problems in AI safety, avoiding negative side effects, avoiding reward hacking, scalable oversight, safe exploration, and robustness to distributional shift, and outlines promising research directions for each.

Tags: ai-safety reward-hacking

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).