241 references

References

A B C D E F G H I J K L M N O P R S T V W X Y Z

A

Anish Athalye, Nicholas Carlini, & David Wagner (2018). Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. International Conference on Machine Learning. https://arxiv.org/abs/1802.00420

Anthropic (2025). Building Effective AI Agents. Anthropic. https://www.anthropic.com/research/building-effective-agents

Cem Anil, Esin Durmus, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Nina Panickssery, Meg Tong, Jesse Mu, Daniel Ford, Francesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan J. Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer, James Sully, Alex Tamkin, Tamera Lanham, Karina Nguyen, Tomasz Korbak, Jared Kaplan, Deep Ganguli, Samuel R. Bowman, Ethan Perez, Roger Grosse, & David Duvenaud (2024). Many-shot Jailbreaking. Anthropic. https://www.anthropic.com/research/many-shot-jailbreaking

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, & Dan Mané (2016). Concrete Problems in AI Safety. arXiv. https://doi.org/10.48550/arxiv.1606.06565

David Arthur & Sergei Vassilvitskii (2007). k-means++: The Advantages of Careful Seeding. Proceedings of SODA 2007, 1027-1035. https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf

Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, & Denny Zhou (2023). What learning algorithm is in-context learning? Investigations with linear models. International Conference on Learning Representations. https://arxiv.org/abs/2211.15661

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, & Karen Simonyan (2022). Flamingo: a Visual Language Model for Few-Shot Learning. Advances in Neural Information Processing Systems 35. https://arxiv.org/abs/2204.14198

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, & Li Zhang (2016). Deep Learning with Differential Privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308-318. https://doi.org/10.1145/2976749.2978318

Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, & Rémi Munos (2023). A General Theoretical Paradigm to Understand Learning from Human Preferences. arXiv:2310.12036. https://arxiv.org/abs/2310.12036

B

Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer. https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/

Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, & Jeffrey Wu (2023). Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. arXiv:2312.09390. https://arxiv.org/abs/2312.09390

David M. Blei, Andrew Y. Ng, & Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research. https://www.jmlr.org/papers/v3/blei03a.html

Dzmitry Bahdanau, Kyunghyun Cho, & Yoshua Bengio (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv. https://doi.org/10.48550/arxiv.1409.0473

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, & Shmargaret Shmitchell (2021). On the Dangers of Stochastic Parrots. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922

Iz Beltagy, Matthew E. Peters, & Arman Cohan (2020). Longformer: The Long-Document Transformer. arXiv. https://doi.org/10.48550/arxiv.2004.05150

James Bergstra & Yoshua Bengio (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305. https://jmlr.org/papers/v13/bergstra12a.html

Jimmy Lei Ba, Jamie Ryan Kiros, & Geoffrey E. Hinton (2016). Layer Normalization. arXiv. https://doi.org/10.48550/arxiv.1607.06450

Leo Breiman (1996). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/bf00058655

Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/a:1010933404324

Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, & Jörg Sander (2000). LOF. Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 93-104. https://doi.org/10.1145/342009.335388

Mikhail Belkin, Daniel Hsu, Siyuan Ma, & Soumik Mandal (2019). Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1903070116

Nick Bostrom (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. https://archive.org/details/superintelligenc00unse

Piotr Bojanowski, Edouard Grave, Armand Joulin, & Tomas Mikolov (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146. https://doi.org/10.1162/tacl_a_00051

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, & Percy Liang (2021). On the Opportunities and Risks of Foundation Models. arXiv. https://doi.org/10.48550/arxiv.2108.07258

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, & Noam Shazeer (2015). Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Advances in Neural Information Processing Systems 28. https://arxiv.org/abs/1506.03099

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, & Dario Amodei (2020). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arxiv.2005.14165

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, & Christian Jauvin (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3, 1137-1155. https://jmlr.org/papers/v3/bengio03a.html

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, & Jared Kaplan (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv. https://doi.org/10.48550/arxiv.2212.08073

C

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, & Noah Fiedel (2022). PaLM: Scaling Language Modeling with Pathways. arXiv. https://doi.org/10.48550/arxiv.2204.02311

Alexandra Chouldechova (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153-163. https://doi.org/10.1089/big.2016.0047

Corinna Cortes & Vladimir Vapnik (1995). Support-vector networks. Machine Learning, 20(3), 273-297. https://doi.org/10.1007/bf00994018

Francesco Croce & Matthias Hein (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. International Conference on Machine Learning. https://arxiv.org/abs/2003.01690

G. Cybenko (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303-314. https://doi.org/10.1007/bf02551274

Jeremy Cohen, Elan Rosenfeld, & J. Zico Kolter (2019). Certified Adversarial Robustness via Randomized Smoothing. International Conference on Machine Learning. https://arxiv.org/abs/1902.02918

Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, & Adrian Weller (2021). Rethinking Attention with Performers. International Conference on Learning Representations. https://arxiv.org/abs/2009.14794

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, & Yoshua Bengio (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv. https://doi.org/10.48550/arxiv.1406.1078

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, & Alan L. Yuille (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848. https://doi.org/10.1109/tpami.2017.2699184

Nicholas Carlini & David Wagner (2017). Towards Evaluating the Robustness of Neural Networks. IEEE Symposium on Security and Privacy. https://doi.org/10.1109/SP.2017.49

Nicholas Carlini & Andreas Terzis (2022). Poisoning and Backdooring Contrastive Learning. International Conference on Learning Representations. https://arxiv.org/abs/2106.09667

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, & Sergey Zagoruyko (2020). End-to-End Object Detection with Transformers. Lecture Notes in Computer Science, 213-229. https://doi.org/10.1007/978-3-030-58452-8_13

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, & W. Philip Kegelmeyer (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. https://doi.org/10.1613/jair.953

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, & Dario Amodei (2017). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arxiv.1706.03741

Paul Christiano, Ajeya Cotra, & Mark Xu (2021). Eliciting Latent Knowledge: How to Tell if Your Eyes Deceive You. Alignment Research Center. https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/

Paul Christiano (2023). My views on "doom". AI Alignment Forum. https://www.alignmentforum.org/posts/xWMqsvHapP3nwdSW8/my-views-on-doom

Ricardo J. G. B. Campello, Davoud Moulavi, & Joerg Sander (2013). Density-Based Clustering Based on Hierarchical Density Estimates. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). https://doi.org/10.1007/978-3-642-37456-2_14

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, & David Duvenaud (2018). Neural Ordinary Differential Equations. arXiv. https://doi.org/10.48550/arxiv.1806.07366

T. Cover & P. Hart (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/tit.1967.1053964

Tianqi Chen & Carlos Guestrin (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785

Ting Chen, Simon Kornblith, Mohammad Norouzi, & Geoffrey Hinton (2020). A Simple Framework for Contrastive Learning of Visual Representations. arXiv. https://doi.org/10.48550/arxiv.2002.05709

D

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, & Neil Houlsby (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv. https://doi.org/10.48550/arxiv.2010.11929

Cynthia Dwork, Frank McSherry, Kobbi Nissim, & Adam Smith (2006). Calibrating Noise to Sensitivity in Private Data Analysis. Lecture Notes in Computer Science, 265-284. https://doi.org/10.1007/11681878_14

DeepSeek-AI (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948. https://arxiv.org/abs/2501.12948

Hubert L. Dreyfus (1992). What Computers Still Can't Do. MIT Press. https://mitpress.mit.edu/9780262540674/what-computers-still-cant-do/

Jacob Devlin, Ming-Wei Chang, Kenton Lee, & Kristina Toutanova (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Volume 1 (Long and Short Papers), 4171-4186. https://doi.org/10.18653/v1/n19-1423

Laurent Dinh, Jascha Sohl-Dickstein, & Samy Bengio (2016). Density estimation using Real NVP. arXiv. https://doi.org/10.48550/arxiv.1605.08803

Prafulla Dhariwal & Alex Nichol (2021). Diffusion Models Beat GANs on Image Synthesis. Advances in Neural Information Processing Systems 34. https://arxiv.org/abs/2105.05233

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, & Luke Zettlemoyer (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv. https://doi.org/10.48550/arxiv.2305.14314

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, & Christopher Ré (2022). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv. https://doi.org/10.48550/arxiv.2205.14135

E

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, & Douwe Kiela (2024). KTO: Model Alignment as Prospect Theoretic Optimization. arXiv:2402.01306. https://arxiv.org/abs/2402.01306

Martin Ester, Hans-Peter Kriegel, Jörg Sander, & Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of KDD 1996, 226-231. https://cdn.aaai.org/KDD/1996/KDD96-037.pdf

F

Jerome H. Friedman (2001). Greedy function approximation: A gradient boosting machine.. The Annals of Statistics, 29(5). https://doi.org/10.1214/aos/1013203451

Jonathan Frankle & Michael Carbin (2018). The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1803.03635

William Fedus, Barret Zoph, & Noam Shazeer (2021). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv. https://doi.org/10.48550/arxiv.2101.03961

Yoav Freund & Robert E Schapire (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1006/jcss.1997.1504

G

Albert Gu & Tri Dao (2024). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752. https://arxiv.org/abs/2312.00752

Alex Graves, Santiago Fernández, Faustino Gomez, & Jürgen Schmidhuber (2006). Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. International Conference on Machine Learning. https://doi.org/10.1145/1143844.1143891

Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Scott Johnston, Shauna Kravec, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Dario Amodei, Tom Brown, Jared Kaplan, Sam McCandlish, Chris Olah, & Jack Clark (2022). Predictability and Surprise in Large Generative Models. ACM Conference on Fairness, Accountability, and Transparency. https://arxiv.org/abs/2202.07785

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, & Jack Clark (2022). Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. arXiv:2209.07858. https://arxiv.org/abs/2209.07858

Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de Oliveira Santos, Olli Järviniemi, Matthew Barnett, Robert Sandler, Jaime Sevilla, Tom Tseng, Yuki Hayashi, Maxim Kapur, Pieter Garrelfs, Carolyn Ashbaugh, & others (2024). FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI. arXiv:2411.04872. https://arxiv.org/abs/2411.04872

Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, & Yoshua Bengio (2014). Generative Adversarial Nets. Proceedings of NeurIPS 2014. https://arxiv.org/abs/1406.2661

Ian J. Goodfellow, Jonathon Shlens, & Christian Szegedy (2015). Explaining and Harnessing Adversarial Examples. International Conference on Learning Representations. https://arxiv.org/abs/1412.6572

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, & Mario Fritz (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. ACM Workshop on Artificial Intelligence and Security. https://arxiv.org/abs/2302.12173

Leo Gao, John Schulman, & Jacob Hilton (2022). Scaling Laws for Reward Model Overoptimization. arXiv:2210.10760. https://arxiv.org/abs/2210.10760

Mor Geva, Roei Schuster, Jonathan Berant, & Omer Levy (2021). Transformer Feed-Forward Layers Are Key-Value Memories. Conference on Empirical Methods in Natural Language Processing. https://arxiv.org/abs/2012.14913

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, & Kaiming He (2017). Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677. https://arxiv.org/abs/1706.02677

Ross Girshick, Jeff Donahue, Trevor Darrell, & Jitendra Malik (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587. https://doi.org/10.1109/cvpr.2014.81

Shivam Garg, Dimitris Tsipras, Percy Liang, & Gregory Valiant (2022). What Can Transformers Learn In-Context? A Case Study of Simple Function Classes. Advances in Neural Information Processing Systems 35. https://arxiv.org/abs/2208.01066

Tianyu Gu, Brendan Dolan-Gavitt, & Siddharth Garg (2017). BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv:1708.06733. https://arxiv.org/abs/1708.06733

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, & Kate Crawford (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86-92. https://doi.org/10.1145/3458723

Xavier Glorot & Yoshua Bengio (2010). Understanding the Difficulty of Training Deep Feedforward Neural Networks. Proceedings of AISTATS 2010, 249-256. https://proceedings.mlr.press/v9/glorot10a.html

H

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, & Hartwig Adam (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv. https://doi.org/10.48550/arxiv.1704.04861

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, & Yejin Choi (2019). The Curious Case of Neural Text Degeneration. arXiv. https://doi.org/10.48550/arxiv.1904.09751

Arthur E. Hoerl & Robert W. Kennard (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. https://doi.org/10.1080/00401706.1970.10488634

Dan Hendrycks & Kevin Gimpel (2016). Gaussian Error Linear Units (GELUs). arXiv. https://doi.org/10.48550/arxiv.1606.08415

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, & Weizhu Chen (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2106.09685

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, & Scott Garrabrant (2019). Risks from Learned Optimization in Advanced Machine Learning Systems. arXiv:1906.01820. https://arxiv.org/abs/1906.01820

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, & Ethan Perez (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv:2401.05566. https://arxiv.org/abs/2401.05566

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, & Kilian Q. Weinberger (2017). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269. https://doi.org/10.1109/cvpr.2017.243

Geoffrey Hinton, Oriol Vinyals, & Jeff Dean (2015). Distilling the Knowledge in a Neural Network. arXiv. https://doi.org/10.48550/arxiv.1503.02531

Jiwoo Hong, Noah Lee, & James Thorne (2024). ORPO: Monolithic Preference Optimization without Reference Model. arXiv:2403.07691. https://arxiv.org/abs/2403.07691

Jonathan Ho, Ajay Jain, & Pieter Abbeel (2020). Denoising Diffusion Probabilistic Models. arXiv. https://doi.org/10.48550/arxiv.2006.11239

Jonathan Ho & Tim Salimans (2022). Classifier-Free Diffusion Guidance. arXiv. https://doi.org/10.48550/arxiv.2207.12598

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, & Laurent Sifre (2022). Training Compute-Optimal Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2203.15556

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv. https://doi.org/10.48550/arxiv.1502.01852

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778. https://doi.org/10.1109/cvpr.2016.90

Kurt Hornik, Maxwell Stinchcombe, & Halbert White (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366. https://doi.org/10.1016/0893-6080(89)90020-8

Moritz Hardt, Benjamin Recht, & Yoram Singer (2016). Train faster, generalize better: Stability of stochastic gradient descent. International Conference on Machine Learning. https://arxiv.org/abs/1509.01240

Sepp Hochreiter & Jürgen Schmidhuber (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735

Trevor Hastie, Robert Tibshirani, & Jerome Friedman (2009). The Elements of Statistical Learning. Springer Series in Statistics. https://doi.org/10.1007/978-0-387-84858-7

I

Geoffrey Irving, Paul Christiano, & Dario Amodei (2018). AI safety via debate. arXiv:1805.00899. https://arxiv.org/abs/1805.00899

Sergey Ioffe & Christian Szegedy (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv. https://doi.org/10.48550/arxiv.1502.03167

J

Arthur Jacot, Franck Gabriel, & Clément Hongler (2018). Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems 31. https://arxiv.org/abs/1806.07572

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, & Demis Hassabis (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2

K

Alex Krizhevsky, Ilya Sutskever, & Geoffrey E. Hinton (2012). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of NeurIPS 2012. https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html

Andrey N. Kolmogorov (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer. https://link.springer.com/book/10.1007/978-3-642-49888-6

Diederik P Kingma & Max Welling (2013). Auto-Encoding Variational Bayes. arXiv. https://doi.org/10.48550/arxiv.1312.6114

Diederik P. Kingma & Jimmy Ba (2014). Adam: A Method for Stochastic Optimization. arXiv. https://doi.org/10.48550/arxiv.1412.6980

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, & Sepp Hochreiter (2017). Self-Normalizing Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1706.02515

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, & Dario Amodei (2020). Scaling Laws for Neural Language Models. arXiv. https://doi.org/10.48550/arxiv.2001.08361

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, & Tom Goldstein (2023). A Watermark for Large Language Models. International Conference on Machine Learning. https://arxiv.org/abs/2301.10226

Jon Kleinberg, Sendhil Mullainathan, & Manish Raghavan (2016). Inherent Trade-Offs in the Fair Determination of Risk Scores. arXiv. https://doi.org/10.48550/arxiv.1609.05807

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, & Yusuke Iwasawa (2022). Large Language Models are Zero-Shot Reasoners. Advances in Neural Information Processing Systems 35. https://arxiv.org/abs/2205.11916

Tero Karras, Samuli Laine, & Timo Aila (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4396-4405. https://doi.org/10.1109/cvpr.2019.00453

Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, & Shane Legg (2020). Avoiding Side Effects By Considering Future Tasks. Advances in Neural Information Processing Systems 33. https://arxiv.org/abs/2010.07877

Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, & Shane Legg (2020). Specification gaming: the flip side of AI ingenuity. DeepMind Blog. https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, & Ion Stoica (2023). Efficient Memory Management for Large Language Model Serving with PagedAttention. ACM Symposium on Operating Systems Principles. https://arxiv.org/abs/2309.06180

L

Fei Tony Liu, Kai Ming Ting, & Zhi-Hua Zhou (2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422. https://doi.org/10.1109/icdm.2008.17

Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, & Karl Cobbe (2023). Let's Verify Step by Step. arXiv:2305.20050. https://arxiv.org/abs/2305.20050

Ilya Loshchilov & Frank Hutter (2016). SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv. https://doi.org/10.48550/arxiv.1608.03983

Ilya Loshchilov & Frank Hutter (2017). Decoupled Weight Decay Regularization. arXiv. https://doi.org/10.48550/arxiv.1711.05101

James Lighthill (1973). Artificial Intelligence: A General Survey. Science Research Council of Great Britain. https://www.chilton-computing.org.uk/inf/literature/reports/lighthill_report/contents.htm

Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, & Shane Legg (2018). Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871. https://arxiv.org/abs/1811.07871

Jonathan Long, Evan Shelhamer, & Trevor Darrell (2015). Fully convolutional networks for semantic segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965

Lauro Langosco di Langosco, Jack Koch, Lee D. Sharkey, Jacob Pfau, Laurent Orseau, & David Krueger (2022). Goal Misgeneralization in Deep Reinforcement Learning. International Conference on Machine Learning. https://arxiv.org/abs/2105.14111

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, & Luke Zettlemoyer (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/1910.13461

Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, & Shimon Schocken (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6), 861-867. https://doi.org/10.1016/s0893-6080(05)80131-5

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, & Douwe Kiela (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv. https://doi.org/10.48550/arxiv.2005.11401

Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, & Peter Battaglia (2023). Learning skillful medium-range global weather forecasting. Science, 382(6677), 1416-1421. https://doi.org/10.1126/science.adi2336

Scott Lundberg & Su-In Lee (2017). A Unified Approach to Interpreting Model Predictions. arXiv. https://doi.org/10.48550/arxiv.1705.07874

Thang Luong, Hieu Pham, & Christopher D. Manning (2015). Effective Approaches to Attention-based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1412-1421. https://doi.org/10.18653/v1/d15-1166

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, & Piotr Dollár (2017). Focal Loss for Dense Object Detection. IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2017.324

Y. Lecun, L. Bottou, Y. Bengio, & P. Haffner (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791

Yaniv Leviathan, Matan Kalman, & Yossi Matias (2023). Fast Inference from Transformers via Speculative Decoding. International Conference on Machine Learning. https://arxiv.org/abs/2211.17192

Yann LeCun (2022). A Path Towards Autonomous Machine Intelligence. Open Review. https://openreview.net/forum?id=BZ5a1r-kVsf

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, & Veselin Stoyanov (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692. https://arxiv.org/abs/1907.11692

M

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, & Adrian Vladu (2017). Towards Deep Learning Models Resistant to Adversarial Attacks. International Conference on Learning Representations. https://arxiv.org/abs/1706.06083

Cade Metz (2023). "The Godfather of A.I." Leaves Google and Warns of Danger Ahead. The New York Times. https://www.nytimes.com/2023/05/01/technology/ai-google-chatbot-engineer-quits-hinton.html

David J. C. MacKay (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. https://www.inference.org.uk/itila/

H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, & Blaise Agüera y Arcas (2016). Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv. https://doi.org/10.48550/arxiv.1602.05629

John McCarthy, Marvin L. Minsky, Nathaniel Rochester, & Claude E. Shannon (1955). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. Dartmouth College. https://raysolomonoff.com/dartmouth/boxa/dart564props.pdf

Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, & Samuel R. Bowman (2023). Debate Helps Supervise Unreliable Experts. arXiv:2311.08702. https://arxiv.org/abs/2311.08702

Kevin P. Murphy (2022). Probabilistic Machine Learning: An Introduction. MIT Press. https://probml.github.io/pml-book/book1.html

Laurens van der Maaten & Geoffrey E. Hinton (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605. https://jmlr.org/papers/v9/vandermaaten08a.html

Leland McInnes, John Healy, & James Melville (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. https://doi.org/10.48550/arxiv.1802.03426

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, & Timnit Gebru (2019). Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229. https://doi.org/10.1145/3287560.3287596

Marvin Minsky & Seymour Papert (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. https://archive.org/details/perceptronsintro00mins

METR (2025). Common Elements of Frontier AI Safety Policies. METR (Model Evaluation and Threat Research). https://metr.org/

Tomas Mikolov, Kai Chen, Greg Corrado, & Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. arXiv. https://doi.org/10.48550/arxiv.1301.3781

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, & Demis Hassabis (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533. https://doi.org/10.1038/nature14236

Yu Meng, Mengzhou Xia, & Danqi Chen (2024). SimPO: Simple Preference Optimization with a Reference-Free Reward. arXiv:2405.14734. https://arxiv.org/abs/2405.14734

N

Vinod Nair & Geoffrey E. Hinton (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of ICML 2010, 807-814. https://icml.cc/Conferences/2010/papers/432.pdf

O

Aaron van den Oord, Oriol Vinyals, & Koray Kavukcuoglu (2017). Neural Discrete Representation Learning. arXiv. https://doi.org/10.48550/arxiv.1711.00937

Aäron van den Oord, Nal Kalchbrenner, & Koray Kavukcuoglu (2016). Pixel Recurrent Neural Networks. International Conference on Machine Learning. https://arxiv.org/abs/1601.06759

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, & Chris Olah (2022). In-context Learning and Induction Heads. Anthropic. https://arxiv.org/abs/2209.11895

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, & Ryan Lowe (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arxiv.2203.02155

OpenAI (2023). GPT-4 Technical Report. arXiv:2303.08774. https://arxiv.org/abs/2303.08774

OpenAI (2024). OpenAI o1 System Card. OpenAI. https://openai.com/index/openai-o1-system-card/

OpenAI (2024). Announcing o3 and o3-mini. OpenAI. https://openai.com/index/openai-o3/

P

Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, & Rui-Jie Zhu (2023). RWKV: Reinventing RNNs for the Transformer Era. Findings of the Association for Computational Linguistics. https://arxiv.org/abs/2305.13048

Boris T. Polyak (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics. https://doi.org/10.1016/0041-5553(64)90137-5

Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, & Jared Kaplan (2022). Discovering Language Model Behaviors with Model-Written Evaluations. arXiv:2212.09251. https://arxiv.org/abs/2212.09251

Jeffrey Pennington, Richard Socher, & Christopher Manning (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543. https://doi.org/10.3115/v1/d14-1162

John C. Platt (1998). Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advances in Kernel Methods. https://doi.org/10.7551/mitpress/1130.003.0016

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, & Luke Zettlemoyer (2018). Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227-2237. https://doi.org/10.18653/v1/n18-1202

Ofir Press, Noah A. Smith, & Mike Lewis (2021). Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. arXiv. https://doi.org/10.48550/arxiv.2108.12409

Razvan Pascanu, Tomas Mikolov, & Yoshua Bengio (2013). On the difficulty of training recurrent neural networks. International Conference on Machine Learning. https://arxiv.org/abs/1211.5063

R

Alec Radford, Luke Metz, & Soumith Chintala (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv. https://doi.org/10.48550/arxiv.1511.06434

Alec Radford, Karthik Narasimhan, Tim Salimans, & Ilya Sutskever (2018). Improving Language Understanding by Generative Pre-Training. OpenAI Technical Report. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, & Ilya Sutskever (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, & Ilya Sutskever (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv. https://doi.org/10.48550/arxiv.2103.00020

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, & Ilya Sutskever (2023). Robust Speech Recognition via Large-Scale Weak Supervision. International Conference on Machine Learning. https://arxiv.org/abs/2212.04356

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, & Peter J. Liu (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv. https://doi.org/10.48550/arxiv.1910.10683

Cynthia Rudin (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215. https://doi.org/10.1038/s42256-019-0048-x

David E. Rumelhart, Geoffrey E. Hinton, & Ronald J. Williams (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0

David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, & Samuel R. Bowman (2023). GPQA: A Graduate-Level Google-Proof Q&A Benchmark. arXiv:2311.12022. https://arxiv.org/abs/2311.12022

F. Rosenblatt (1958). The perceptron: A probabilistic model for information storage and organization in the brain.. Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519

Herbert Robbins & Sutton Monro (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3), 400-407. https://doi.org/10.1214/aoms/1177729586

Joseph Redmon, Santosh Divvala, Ross Girshick, & Ali Farhadi (2016). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. https://doi.org/10.1109/cvpr.2016.91

Marco Tulio Ribeiro, Sameer Singh, & Carlos Guestrin (2016). "Why Should I Trust You?". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144. https://doi.org/10.1145/2939672.2939778

Mark Russinovich, Ahmed Salem, & Ronen Eldan (2024). Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv:2404.01833. https://arxiv.org/abs/2404.01833

Nils Reimers & Iryna Gurevych (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Conference on Empirical Methods in Natural Language Processing. https://arxiv.org/abs/1908.10084

Olaf Ronneberger, Philipp Fischer, & Thomas Brox (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28

Prajit Ramachandran, Barret Zoph, & Quoc V. Le (2017). Searching for Activation Functions. arXiv. https://doi.org/10.48550/arxiv.1710.05941

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, & Chelsea Finn (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv. https://doi.org/10.48550/arxiv.2305.18290

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, & Bjorn Ommer (2022). High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10674-10685. https://doi.org/10.1109/cvpr52688.2022.01042

Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv. https://doi.org/10.48550/arxiv.1506.01497

Stuart Russell (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking. https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/

Stuart Russell & Peter Norvig (2020). Artificial Intelligence: A Modern Approach. Pearson, 4th edition. https://aima.cs.berkeley.edu/

S

C. E. Shannon (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Charlie Snell, Jaehoon Lee, Kelvin Xu, & Aviral Kumar (2024). Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. arXiv:2408.03314. https://arxiv.org/abs/2408.03314

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, & Rob Fergus (2013). Intriguing properties of neural networks. arXiv. https://doi.org/10.48550/arxiv.1312.6199

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, & Andrew Rabinovich (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9. https://doi.org/10.1109/cvpr.2015.7298594

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, & Zbigniew Wojna (2016). Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2818-2826. https://doi.org/10.1109/cvpr.2016.308

David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, & Dan Dennison (2015). Hidden Technical Debt in Machine Learning Systems. Proceedings of NeurIPS 2015. https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, & Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489. https://doi.org/10.1038/nature16961

Ilya Sutskever, Oriol Vinyals, & Quoc V. Le (2014). Sequence to Sequence Learning with Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1409.3215

Jasper Snoek, Hugo Larochelle, & Ryan P. Adams (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems 25. https://arxiv.org/abs/1206.2944

Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, & Yunfeng Liu (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv. https://doi.org/10.48550/arxiv.2104.09864

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, & Oleg Klimov (2017). Proximal Policy Optimization Algorithms. arXiv. https://doi.org/10.48550/arxiv.1707.06347

Karen Simonyan & Andrew Zisserman (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv. https://doi.org/10.48550/arxiv.1409.1556

Leslie N. Smith (2015). Cyclical Learning Rates for Training Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1506.01186

Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, & Ethan Perez (2023). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. https://arxiv.org/abs/2310.13548

Mukund Sundararajan, Ankur Taly, & Qiqi Yan (2017). Axiomatic Attribution for Deep Networks. arXiv. https://doi.org/10.48550/arxiv.1703.01365

Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, & Ruslan Salakhutdinov (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(1), 1929-1958. https://jmlr.org/papers/v15/srivastava14a.html

Noam Shazeer (2019). Fast Transformer Decoding: One Write-Head is All You Need. arXiv. https://doi.org/10.48550/arxiv.1911.02150

Noam Shazeer (2020). GLU Variants Improve Transformer. arXiv:2002.05202. https://arxiv.org/abs/2002.05202

Peter Shaw, Jakob Uszkoreit, & Ashish Vaswani (2018). Self-Attention with Relative Position Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 464-468. https://doi.org/10.18653/v1/n18-2074

Richard S. Sutton & Andrew G. Barto (2018). Reinforcement Learning: An Introduction (2nd edition). MIT Press. http://incompleteideas.net/book/the-book-2nd.html

Rico Sennrich, Barry Haddow, & Alexandra Birch (2016). Neural Machine Translation of Rare Words with Subword Units. Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/1508.07909

Roei Schuster, Congzheng Song, Eran Tromer, & Vitaly Shmatikov (2021). You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. USENIX Security Symposium. https://arxiv.org/abs/2007.02220

Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, & Zac Kenton (2022). Goal Misgeneralization - Why Correct Specifications Aren't Enough For Correct Goals. arXiv:2210.01790. https://arxiv.org/abs/2210.01790

Rylan Schaeffer, Brando Miranda, & Sanmi Koyejo (2023). Are Emergent Abilities of Large Language Models a Mirage?. arXiv. https://doi.org/10.48550/arxiv.2304.15004

Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, & Aleksander Madry (2018). How Does Batch Normalization Help Optimization?. arXiv. https://doi.org/10.48550/arxiv.1805.11604

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, & Ben Poole (2020). Score-Based Generative Modeling through Stochastic Differential Equations. arXiv. https://doi.org/10.48550/arxiv.2011.13456

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, & Furu Wei (2023). Retentive Network: A Successor to Transformer for Large Language Models. arXiv:2307.08621. https://arxiv.org/abs/2307.08621

T

A. M. Turing (1950). I. Computing Machinery and Intelligence. Mind, LIX(236), 433-460. https://doi.org/10.1093/mind/lix.236.433

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, & Guillaume Lample (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv. https://doi.org/10.48550/arxiv.2302.13971

Matus Telgarsky (2016). Benefits of depth in neural networks. arXiv. https://doi.org/10.48550/arxiv.1602.04485

Mingxing Tan & Quoc V. Le (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1905.11946

Robert Tibshirani (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

V

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, & Illia Polosukhin (2017). Attention Is All You Need. arXiv. https://doi.org/10.48550/arxiv.1706.03762

W

Alexander Wei, Nika Haghtalab, & Jacob Steinhardt (2023). Jailbroken: How Does LLM Safety Training Fail?. Advances in Neural Information Processing Systems 36. https://arxiv.org/abs/2307.02483

Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, & Benjamin Recht (2017). The Marginal Value of Adaptive Gradient Methods in Machine Learning. arXiv. https://doi.org/10.48550/arxiv.1705.08292

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, & William Fedus (2022). Emergent Abilities of Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2206.07682

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, & Denny Zhou (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2201.11903

Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, & Paul Christiano (2021). Recursively Summarizing Books with Human Feedback. arXiv:2109.10862. https://arxiv.org/abs/2109.10862

Joseph Weizenbaum (1976). Computer Power and Human Reason: From Judgment to Calculation. W. H. Freeman. https://archive.org/details/computerpowerhum0000weiz

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, & Iason Gabriel (2022). Taxonomy of Risks posed by Language Models. 2022 ACM Conference on Fairness Accountability and Transparency, 214-229. https://doi.org/10.1145/3531146.3533088

Norbert Wiener (1960). Some Moral and Technical Consequences of Automation. Science. https://doi.org/10.1126/science.131.3410.1355

Paul J. Werbos (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University. https://gwern.net/doc/ai/nn/1974-werbos.pdf

Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, & Hao Ma (2020). Linformer: Self-Attention with Linear Complexity. arXiv:2006.04768. https://arxiv.org/abs/2006.04768

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, & Denny Zhou (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. International Conference on Learning Representations. https://arxiv.org/abs/2203.11171

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, & Jeffrey Dean (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144. https://arxiv.org/abs/1609.08144

X

Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, & Tieyan Liu (2020). On Layer Normalization in the Transformer Architecture. International Conference on Machine Learning. https://arxiv.org/abs/2002.04745

Y

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, & Yuan Cao (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv. https://doi.org/10.48550/arxiv.2210.03629

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, & Karthik Narasimhan (2024). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Advances in Neural Information Processing Systems 36. https://arxiv.org/abs/2305.10601

Z

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, & Matt Fredrikson (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv:2307.15043. https://arxiv.org/abs/2307.15043

Biao Zhang & Rico Sennrich (2019). Root Mean Square Layer Normalization. Advances in Neural Information Processing Systems 32. https://arxiv.org/abs/1910.07467

Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, & Boaz Barak (2023). Watermarks in the Sand - Impossibility of Strong Watermarking for Generative Models. arXiv:2311.04378. https://arxiv.org/abs/2311.04378

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, & David Lopez-Paz (2017). mixup: Beyond Empirical Risk Minimization. arXiv. https://doi.org/10.48550/arxiv.1710.09412

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.