241 references
References
A
Anish Athalye, Nicholas Carlini, & David Wagner (2018). Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. International Conference on Machine Learning. https://arxiv.org/abs/1802.00420
Anthropic (2025). Building Effective AI Agents. Anthropic. https://www.anthropic.com/research/building-effective-agents
Cem Anil, Esin Durmus, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Nina Panickssery, Meg Tong, Jesse Mu, Daniel Ford, Francesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan J. Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer, James Sully, Alex Tamkin, Tamera Lanham, Karina Nguyen, Tomasz Korbak, Jared Kaplan, Deep Ganguli, Samuel R. Bowman, Ethan Perez, Roger Grosse, & David Duvenaud (2024). Many-shot Jailbreaking. Anthropic. https://www.anthropic.com/research/many-shot-jailbreaking
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, & Dan Mané (2016). Concrete Problems in AI Safety. arXiv. https://doi.org/10.48550/arxiv.1606.06565
David Arthur & Sergei Vassilvitskii (2007). k-means++: The Advantages of Careful Seeding. Proceedings of SODA 2007, 1027-1035. https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, & Denny Zhou (2023). What learning algorithm is in-context learning? Investigations with linear models. International Conference on Learning Representations. https://arxiv.org/abs/2211.15661
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, & Karen Simonyan (2022). Flamingo: a Visual Language Model for Few-Shot Learning. Advances in Neural Information Processing Systems 35. https://arxiv.org/abs/2204.14198
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, & Li Zhang (2016). Deep Learning with Differential Privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308-318. https://doi.org/10.1145/2976749.2978318
Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, & Rémi Munos (2023). A General Theoretical Paradigm to Understand Learning from Human Preferences. arXiv:2310.12036. https://arxiv.org/abs/2310.12036
B
Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer. https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/
Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, & Jeffrey Wu (2023). Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. arXiv:2312.09390. https://arxiv.org/abs/2312.09390
David M. Blei, Andrew Y. Ng, & Michael I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research. https://www.jmlr.org/papers/v3/blei03a.html
Dzmitry Bahdanau, Kyunghyun Cho, & Yoshua Bengio (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv. https://doi.org/10.48550/arxiv.1409.0473
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, & Shmargaret Shmitchell (2021). On the Dangers of Stochastic Parrots. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922
Iz Beltagy, Matthew E. Peters, & Arman Cohan (2020). Longformer: The Long-Document Transformer. arXiv. https://doi.org/10.48550/arxiv.2004.05150
James Bergstra & Yoshua Bengio (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305. https://jmlr.org/papers/v13/bergstra12a.html
Jimmy Lei Ba, Jamie Ryan Kiros, & Geoffrey E. Hinton (2016). Layer Normalization. arXiv. https://doi.org/10.48550/arxiv.1607.06450
Leo Breiman (1996). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/bf00058655
Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/a:1010933404324
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, & Jörg Sander (2000). LOF. Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 93-104. https://doi.org/10.1145/342009.335388
Mikhail Belkin, Daniel Hsu, Siyuan Ma, & Soumik Mandal (2019). Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1903070116
Nick Bostrom (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. https://archive.org/details/superintelligenc00unse
Piotr Bojanowski, Edouard Grave, Armand Joulin, & Tomas Mikolov (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146. https://doi.org/10.1162/tacl_a_00051
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, & Percy Liang (2021). On the Opportunities and Risks of Foundation Models. arXiv. https://doi.org/10.48550/arxiv.2108.07258
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, & Noam Shazeer (2015). Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Advances in Neural Information Processing Systems 28. https://arxiv.org/abs/1506.03099
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, & Dario Amodei (2020). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arxiv.2005.14165
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, & Christian Jauvin (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3, 1137-1155. https://jmlr.org/papers/v3/bengio03a.html
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, & Jared Kaplan (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv. https://doi.org/10.48550/arxiv.2212.08073
C
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, & Noah Fiedel (2022). PaLM: Scaling Language Modeling with Pathways. arXiv. https://doi.org/10.48550/arxiv.2204.02311
Alexandra Chouldechova (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153-163. https://doi.org/10.1089/big.2016.0047
Corinna Cortes & Vladimir Vapnik (1995). Support-vector networks. Machine Learning, 20(3), 273-297. https://doi.org/10.1007/bf00994018
Francesco Croce & Matthias Hein (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. International Conference on Machine Learning. https://arxiv.org/abs/2003.01690
G. Cybenko (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303-314. https://doi.org/10.1007/bf02551274
Jeremy Cohen, Elan Rosenfeld, & J. Zico Kolter (2019). Certified Adversarial Robustness via Randomized Smoothing. International Conference on Machine Learning. https://arxiv.org/abs/1902.02918
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, & Adrian Weller (2021). Rethinking Attention with Performers. International Conference on Learning Representations. https://arxiv.org/abs/2009.14794
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, & Yoshua Bengio (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv. https://doi.org/10.48550/arxiv.1406.1078
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, & Alan L. Yuille (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848. https://doi.org/10.1109/tpami.2017.2699184
Nicholas Carlini & David Wagner (2017). Towards Evaluating the Robustness of Neural Networks. IEEE Symposium on Security and Privacy. https://doi.org/10.1109/SP.2017.49
Nicholas Carlini & Andreas Terzis (2022). Poisoning and Backdooring Contrastive Learning. International Conference on Learning Representations. https://arxiv.org/abs/2106.09667
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, & Sergey Zagoruyko (2020). End-to-End Object Detection with Transformers. Lecture Notes in Computer Science, 213-229. https://doi.org/10.1007/978-3-030-58452-8_13
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, & W. Philip Kegelmeyer (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. https://doi.org/10.1613/jair.953
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, & Dario Amodei (2017). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arxiv.1706.03741
Paul Christiano, Ajeya Cotra, & Mark Xu (2021). Eliciting Latent Knowledge: How to Tell if Your Eyes Deceive You. Alignment Research Center. https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/
Paul Christiano (2023). My views on "doom". AI Alignment Forum. https://www.alignmentforum.org/posts/xWMqsvHapP3nwdSW8/my-views-on-doom
Ricardo J. G. B. Campello, Davoud Moulavi, & Joerg Sander (2013). Density-Based Clustering Based on Hierarchical Density Estimates. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). https://doi.org/10.1007/978-3-642-37456-2_14
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, & David Duvenaud (2018). Neural Ordinary Differential Equations. arXiv. https://doi.org/10.48550/arxiv.1806.07366
T. Cover & P. Hart (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/tit.1967.1053964
Tianqi Chen & Carlos Guestrin (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785
Ting Chen, Simon Kornblith, Mohammad Norouzi, & Geoffrey Hinton (2020). A Simple Framework for Contrastive Learning of Visual Representations. arXiv. https://doi.org/10.48550/arxiv.2002.05709
D
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, & Neil Houlsby (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv. https://doi.org/10.48550/arxiv.2010.11929
Cynthia Dwork, Frank McSherry, Kobbi Nissim, & Adam Smith (2006). Calibrating Noise to Sensitivity in Private Data Analysis. Lecture Notes in Computer Science, 265-284. https://doi.org/10.1007/11681878_14
DeepSeek-AI (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948. https://arxiv.org/abs/2501.12948
Hubert L. Dreyfus (1992). What Computers Still Can't Do. MIT Press. https://mitpress.mit.edu/9780262540674/what-computers-still-cant-do/
Jacob Devlin, Ming-Wei Chang, Kenton Lee, & Kristina Toutanova (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Volume 1 (Long and Short Papers), 4171-4186. https://doi.org/10.18653/v1/n19-1423
Laurent Dinh, Jascha Sohl-Dickstein, & Samy Bengio (2016). Density estimation using Real NVP. arXiv. https://doi.org/10.48550/arxiv.1605.08803
Prafulla Dhariwal & Alex Nichol (2021). Diffusion Models Beat GANs on Image Synthesis. Advances in Neural Information Processing Systems 34. https://arxiv.org/abs/2105.05233
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, & Luke Zettlemoyer (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv. https://doi.org/10.48550/arxiv.2305.14314
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, & Christopher Ré (2022). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv. https://doi.org/10.48550/arxiv.2205.14135
E
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, & Douwe Kiela (2024). KTO: Model Alignment as Prospect Theoretic Optimization. arXiv:2402.01306. https://arxiv.org/abs/2402.01306
Martin Ester, Hans-Peter Kriegel, Jörg Sander, & Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of KDD 1996, 226-231. https://cdn.aaai.org/KDD/1996/KDD96-037.pdf
F
Jerome H. Friedman (2001). Greedy function approximation: A gradient boosting machine.. The Annals of Statistics, 29(5). https://doi.org/10.1214/aos/1013203451
Jonathan Frankle & Michael Carbin (2018). The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1803.03635
William Fedus, Barret Zoph, & Noam Shazeer (2021). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv. https://doi.org/10.48550/arxiv.2101.03961
Yoav Freund & Robert E Schapire (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1006/jcss.1997.1504
G
Albert Gu & Tri Dao (2024). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752. https://arxiv.org/abs/2312.00752
Alex Graves, Santiago Fernández, Faustino Gomez, & Jürgen Schmidhuber (2006). Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. International Conference on Machine Learning. https://doi.org/10.1145/1143844.1143891
Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Scott Johnston, Shauna Kravec, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Dario Amodei, Tom Brown, Jared Kaplan, Sam McCandlish, Chris Olah, & Jack Clark (2022). Predictability and Surprise in Large Generative Models. ACM Conference on Fairness, Accountability, and Transparency. https://arxiv.org/abs/2202.07785
Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, & Jack Clark (2022). Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. arXiv:2209.07858. https://arxiv.org/abs/2209.07858
Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de Oliveira Santos, Olli Järviniemi, Matthew Barnett, Robert Sandler, Jaime Sevilla, Tom Tseng, Yuki Hayashi, Maxim Kapur, Pieter Garrelfs, Carolyn Ashbaugh, & others (2024). FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI. arXiv:2411.04872. https://arxiv.org/abs/2411.04872
Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, & Yoshua Bengio (2014). Generative Adversarial Nets. Proceedings of NeurIPS 2014. https://arxiv.org/abs/1406.2661
Ian J. Goodfellow, Jonathon Shlens, & Christian Szegedy (2015). Explaining and Harnessing Adversarial Examples. International Conference on Learning Representations. https://arxiv.org/abs/1412.6572
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, & Mario Fritz (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. ACM Workshop on Artificial Intelligence and Security. https://arxiv.org/abs/2302.12173
Leo Gao, John Schulman, & Jacob Hilton (2022). Scaling Laws for Reward Model Overoptimization. arXiv:2210.10760. https://arxiv.org/abs/2210.10760
Mor Geva, Roei Schuster, Jonathan Berant, & Omer Levy (2021). Transformer Feed-Forward Layers Are Key-Value Memories. Conference on Empirical Methods in Natural Language Processing. https://arxiv.org/abs/2012.14913
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, & Kaiming He (2017). Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677. https://arxiv.org/abs/1706.02677
Ross Girshick, Jeff Donahue, Trevor Darrell, & Jitendra Malik (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587. https://doi.org/10.1109/cvpr.2014.81
Shivam Garg, Dimitris Tsipras, Percy Liang, & Gregory Valiant (2022). What Can Transformers Learn In-Context? A Case Study of Simple Function Classes. Advances in Neural Information Processing Systems 35. https://arxiv.org/abs/2208.01066
Tianyu Gu, Brendan Dolan-Gavitt, & Siddharth Garg (2017). BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv:1708.06733. https://arxiv.org/abs/1708.06733
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, & Kate Crawford (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86-92. https://doi.org/10.1145/3458723
Xavier Glorot & Yoshua Bengio (2010). Understanding the Difficulty of Training Deep Feedforward Neural Networks. Proceedings of AISTATS 2010, 249-256. https://proceedings.mlr.press/v9/glorot10a.html
H
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, & Hartwig Adam (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv. https://doi.org/10.48550/arxiv.1704.04861
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, & Yejin Choi (2019). The Curious Case of Neural Text Degeneration. arXiv. https://doi.org/10.48550/arxiv.1904.09751
Arthur E. Hoerl & Robert W. Kennard (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. https://doi.org/10.1080/00401706.1970.10488634
Dan Hendrycks & Kevin Gimpel (2016). Gaussian Error Linear Units (GELUs). arXiv. https://doi.org/10.48550/arxiv.1606.08415
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, & Weizhu Chen (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2106.09685
Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, & Scott Garrabrant (2019). Risks from Learned Optimization in Advanced Machine Learning Systems. arXiv:1906.01820. https://arxiv.org/abs/1906.01820
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, & Ethan Perez (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv:2401.05566. https://arxiv.org/abs/2401.05566
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, & Kilian Q. Weinberger (2017). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269. https://doi.org/10.1109/cvpr.2017.243
Geoffrey Hinton, Oriol Vinyals, & Jeff Dean (2015). Distilling the Knowledge in a Neural Network. arXiv. https://doi.org/10.48550/arxiv.1503.02531
Jiwoo Hong, Noah Lee, & James Thorne (2024). ORPO: Monolithic Preference Optimization without Reference Model. arXiv:2403.07691. https://arxiv.org/abs/2403.07691
Jonathan Ho, Ajay Jain, & Pieter Abbeel (2020). Denoising Diffusion Probabilistic Models. arXiv. https://doi.org/10.48550/arxiv.2006.11239
Jonathan Ho & Tim Salimans (2022). Classifier-Free Diffusion Guidance. arXiv. https://doi.org/10.48550/arxiv.2207.12598
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, & Laurent Sifre (2022). Training Compute-Optimal Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2203.15556
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv. https://doi.org/10.48550/arxiv.1502.01852
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778. https://doi.org/10.1109/cvpr.2016.90
Kurt Hornik, Maxwell Stinchcombe, & Halbert White (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366. https://doi.org/10.1016/0893-6080(89)90020-8
Moritz Hardt, Benjamin Recht, & Yoram Singer (2016). Train faster, generalize better: Stability of stochastic gradient descent. International Conference on Machine Learning. https://arxiv.org/abs/1509.01240
Sepp Hochreiter & Jürgen Schmidhuber (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
Trevor Hastie, Robert Tibshirani, & Jerome Friedman (2009). The Elements of Statistical Learning. Springer Series in Statistics. https://doi.org/10.1007/978-0-387-84858-7
I
Geoffrey Irving, Paul Christiano, & Dario Amodei (2018). AI safety via debate. arXiv:1805.00899. https://arxiv.org/abs/1805.00899
Sergey Ioffe & Christian Szegedy (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv. https://doi.org/10.48550/arxiv.1502.03167
J
Arthur Jacot, Franck Gabriel, & Clément Hongler (2018). Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems 31. https://arxiv.org/abs/1806.07572
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, & Demis Hassabis (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2
K
Alex Krizhevsky, Ilya Sutskever, & Geoffrey E. Hinton (2012). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of NeurIPS 2012. https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
Andrey N. Kolmogorov (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer. https://link.springer.com/book/10.1007/978-3-642-49888-6
Diederik P Kingma & Max Welling (2013). Auto-Encoding Variational Bayes. arXiv. https://doi.org/10.48550/arxiv.1312.6114
Diederik P. Kingma & Jimmy Ba (2014). Adam: A Method for Stochastic Optimization. arXiv. https://doi.org/10.48550/arxiv.1412.6980
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, & Sepp Hochreiter (2017). Self-Normalizing Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1706.02515
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, & Dario Amodei (2020). Scaling Laws for Neural Language Models. arXiv. https://doi.org/10.48550/arxiv.2001.08361
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, & Tom Goldstein (2023). A Watermark for Large Language Models. International Conference on Machine Learning. https://arxiv.org/abs/2301.10226
Jon Kleinberg, Sendhil Mullainathan, & Manish Raghavan (2016). Inherent Trade-Offs in the Fair Determination of Risk Scores. arXiv. https://doi.org/10.48550/arxiv.1609.05807
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, & Yusuke Iwasawa (2022). Large Language Models are Zero-Shot Reasoners. Advances in Neural Information Processing Systems 35. https://arxiv.org/abs/2205.11916
Tero Karras, Samuli Laine, & Timo Aila (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4396-4405. https://doi.org/10.1109/cvpr.2019.00453
Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, & Shane Legg (2020). Avoiding Side Effects By Considering Future Tasks. Advances in Neural Information Processing Systems 33. https://arxiv.org/abs/2010.07877
Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, & Shane Legg (2020). Specification gaming: the flip side of AI ingenuity. DeepMind Blog. https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, & Ion Stoica (2023). Efficient Memory Management for Large Language Model Serving with PagedAttention. ACM Symposium on Operating Systems Principles. https://arxiv.org/abs/2309.06180
L
Fei Tony Liu, Kai Ming Ting, & Zhi-Hua Zhou (2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422. https://doi.org/10.1109/icdm.2008.17
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, & Karl Cobbe (2023). Let's Verify Step by Step. arXiv:2305.20050. https://arxiv.org/abs/2305.20050
Ilya Loshchilov & Frank Hutter (2016). SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv. https://doi.org/10.48550/arxiv.1608.03983
Ilya Loshchilov & Frank Hutter (2017). Decoupled Weight Decay Regularization. arXiv. https://doi.org/10.48550/arxiv.1711.05101
James Lighthill (1973). Artificial Intelligence: A General Survey. Science Research Council of Great Britain. https://www.chilton-computing.org.uk/inf/literature/reports/lighthill_report/contents.htm
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, & Shane Legg (2018). Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871. https://arxiv.org/abs/1811.07871
Jonathan Long, Evan Shelhamer, & Trevor Darrell (2015). Fully convolutional networks for semantic segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965
Lauro Langosco di Langosco, Jack Koch, Lee D. Sharkey, Jacob Pfau, Laurent Orseau, & David Krueger (2022). Goal Misgeneralization in Deep Reinforcement Learning. International Conference on Machine Learning. https://arxiv.org/abs/2105.14111
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, & Luke Zettlemoyer (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/1910.13461
Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, & Shimon Schocken (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6), 861-867. https://doi.org/10.1016/s0893-6080(05)80131-5
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, & Douwe Kiela (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv. https://doi.org/10.48550/arxiv.2005.11401
Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, & Peter Battaglia (2023). Learning skillful medium-range global weather forecasting. Science, 382(6677), 1416-1421. https://doi.org/10.1126/science.adi2336
Scott Lundberg & Su-In Lee (2017). A Unified Approach to Interpreting Model Predictions. arXiv. https://doi.org/10.48550/arxiv.1705.07874
Thang Luong, Hieu Pham, & Christopher D. Manning (2015). Effective Approaches to Attention-based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1412-1421. https://doi.org/10.18653/v1/d15-1166
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, & Piotr Dollár (2017). Focal Loss for Dense Object Detection. IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2017.324
Y. Lecun, L. Bottou, Y. Bengio, & P. Haffner (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
Yaniv Leviathan, Matan Kalman, & Yossi Matias (2023). Fast Inference from Transformers via Speculative Decoding. International Conference on Machine Learning. https://arxiv.org/abs/2211.17192
Yann LeCun (2022). A Path Towards Autonomous Machine Intelligence. Open Review. https://openreview.net/forum?id=BZ5a1r-kVsf
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, & Veselin Stoyanov (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692. https://arxiv.org/abs/1907.11692
M
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, & Adrian Vladu (2017). Towards Deep Learning Models Resistant to Adversarial Attacks. International Conference on Learning Representations. https://arxiv.org/abs/1706.06083
Cade Metz (2023). "The Godfather of A.I." Leaves Google and Warns of Danger Ahead. The New York Times. https://www.nytimes.com/2023/05/01/technology/ai-google-chatbot-engineer-quits-hinton.html
David J. C. MacKay (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. https://www.inference.org.uk/itila/
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, & Blaise Agüera y Arcas (2016). Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv. https://doi.org/10.48550/arxiv.1602.05629
John McCarthy, Marvin L. Minsky, Nathaniel Rochester, & Claude E. Shannon (1955). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. Dartmouth College. https://raysolomonoff.com/dartmouth/boxa/dart564props.pdf
Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, & Samuel R. Bowman (2023). Debate Helps Supervise Unreliable Experts. arXiv:2311.08702. https://arxiv.org/abs/2311.08702
Kevin P. Murphy (2022). Probabilistic Machine Learning: An Introduction. MIT Press. https://probml.github.io/pml-book/book1.html
Laurens van der Maaten & Geoffrey E. Hinton (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605. https://jmlr.org/papers/v9/vandermaaten08a.html
Leland McInnes, John Healy, & James Melville (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. https://doi.org/10.48550/arxiv.1802.03426
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, & Timnit Gebru (2019). Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229. https://doi.org/10.1145/3287560.3287596
Marvin Minsky & Seymour Papert (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. https://archive.org/details/perceptronsintro00mins
METR (2025). Common Elements of Frontier AI Safety Policies. METR (Model Evaluation and Threat Research). https://metr.org/
Tomas Mikolov, Kai Chen, Greg Corrado, & Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. arXiv. https://doi.org/10.48550/arxiv.1301.3781
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, & Demis Hassabis (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533. https://doi.org/10.1038/nature14236
Yu Meng, Mengzhou Xia, & Danqi Chen (2024). SimPO: Simple Preference Optimization with a Reference-Free Reward. arXiv:2405.14734. https://arxiv.org/abs/2405.14734
N
Vinod Nair & Geoffrey E. Hinton (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of ICML 2010, 807-814. https://icml.cc/Conferences/2010/papers/432.pdf
O
Aaron van den Oord, Oriol Vinyals, & Koray Kavukcuoglu (2017). Neural Discrete Representation Learning. arXiv. https://doi.org/10.48550/arxiv.1711.00937
Aäron van den Oord, Nal Kalchbrenner, & Koray Kavukcuoglu (2016). Pixel Recurrent Neural Networks. International Conference on Machine Learning. https://arxiv.org/abs/1601.06759
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, & Chris Olah (2022). In-context Learning and Induction Heads. Anthropic. https://arxiv.org/abs/2209.11895
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, & Ryan Lowe (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arxiv.2203.02155
OpenAI (2023). GPT-4 Technical Report. arXiv:2303.08774. https://arxiv.org/abs/2303.08774
OpenAI (2024). OpenAI o1 System Card. OpenAI. https://openai.com/index/openai-o1-system-card/
OpenAI (2024). Announcing o3 and o3-mini. OpenAI. https://openai.com/index/openai-o3/
P
Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, & Rui-Jie Zhu (2023). RWKV: Reinventing RNNs for the Transformer Era. Findings of the Association for Computational Linguistics. https://arxiv.org/abs/2305.13048
Boris T. Polyak (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics. https://doi.org/10.1016/0041-5553(64)90137-5
Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, & Jared Kaplan (2022). Discovering Language Model Behaviors with Model-Written Evaluations. arXiv:2212.09251. https://arxiv.org/abs/2212.09251
Jeffrey Pennington, Richard Socher, & Christopher Manning (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543. https://doi.org/10.3115/v1/d14-1162
John C. Platt (1998). Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advances in Kernel Methods. https://doi.org/10.7551/mitpress/1130.003.0016
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, & Luke Zettlemoyer (2018). Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227-2237. https://doi.org/10.18653/v1/n18-1202
Ofir Press, Noah A. Smith, & Mike Lewis (2021). Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. arXiv. https://doi.org/10.48550/arxiv.2108.12409
Razvan Pascanu, Tomas Mikolov, & Yoshua Bengio (2013). On the difficulty of training recurrent neural networks. International Conference on Machine Learning. https://arxiv.org/abs/1211.5063
R
Alec Radford, Luke Metz, & Soumith Chintala (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv. https://doi.org/10.48550/arxiv.1511.06434
Alec Radford, Karthik Narasimhan, Tim Salimans, & Ilya Sutskever (2018). Improving Language Understanding by Generative Pre-Training. OpenAI Technical Report. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, & Ilya Sutskever (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, & Ilya Sutskever (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv. https://doi.org/10.48550/arxiv.2103.00020
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, & Ilya Sutskever (2023). Robust Speech Recognition via Large-Scale Weak Supervision. International Conference on Machine Learning. https://arxiv.org/abs/2212.04356
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, & Peter J. Liu (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv. https://doi.org/10.48550/arxiv.1910.10683
Cynthia Rudin (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215. https://doi.org/10.1038/s42256-019-0048-x
David E. Rumelhart, Geoffrey E. Hinton, & Ronald J. Williams (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, & Samuel R. Bowman (2023). GPQA: A Graduate-Level Google-Proof Q&A Benchmark. arXiv:2311.12022. https://arxiv.org/abs/2311.12022
F. Rosenblatt (1958). The perceptron: A probabilistic model for information storage and organization in the brain.. Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519
Herbert Robbins & Sutton Monro (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3), 400-407. https://doi.org/10.1214/aoms/1177729586
Joseph Redmon, Santosh Divvala, Ross Girshick, & Ali Farhadi (2016). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. https://doi.org/10.1109/cvpr.2016.91
Marco Tulio Ribeiro, Sameer Singh, & Carlos Guestrin (2016). "Why Should I Trust You?". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144. https://doi.org/10.1145/2939672.2939778
Mark Russinovich, Ahmed Salem, & Ronen Eldan (2024). Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv:2404.01833. https://arxiv.org/abs/2404.01833
Nils Reimers & Iryna Gurevych (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Conference on Empirical Methods in Natural Language Processing. https://arxiv.org/abs/1908.10084
Olaf Ronneberger, Philipp Fischer, & Thomas Brox (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
Prajit Ramachandran, Barret Zoph, & Quoc V. Le (2017). Searching for Activation Functions. arXiv. https://doi.org/10.48550/arxiv.1710.05941
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, & Chelsea Finn (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv. https://doi.org/10.48550/arxiv.2305.18290
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, & Bjorn Ommer (2022). High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10674-10685. https://doi.org/10.1109/cvpr52688.2022.01042
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv. https://doi.org/10.48550/arxiv.1506.01497
Stuart Russell (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking. https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/
Stuart Russell & Peter Norvig (2020). Artificial Intelligence: A Modern Approach. Pearson, 4th edition. https://aima.cs.berkeley.edu/
S
C. E. Shannon (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Charlie Snell, Jaehoon Lee, Kelvin Xu, & Aviral Kumar (2024). Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. arXiv:2408.03314. https://arxiv.org/abs/2408.03314
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, & Rob Fergus (2013). Intriguing properties of neural networks. arXiv. https://doi.org/10.48550/arxiv.1312.6199
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, & Andrew Rabinovich (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9. https://doi.org/10.1109/cvpr.2015.7298594
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, & Zbigniew Wojna (2016). Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2818-2826. https://doi.org/10.1109/cvpr.2016.308
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, & Dan Dennison (2015). Hidden Technical Debt in Machine Learning Systems. Proceedings of NeurIPS 2015. https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, & Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489. https://doi.org/10.1038/nature16961
Ilya Sutskever, Oriol Vinyals, & Quoc V. Le (2014). Sequence to Sequence Learning with Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1409.3215
Jasper Snoek, Hugo Larochelle, & Ryan P. Adams (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems 25. https://arxiv.org/abs/1206.2944
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, & Yunfeng Liu (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv. https://doi.org/10.48550/arxiv.2104.09864
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, & Oleg Klimov (2017). Proximal Policy Optimization Algorithms. arXiv. https://doi.org/10.48550/arxiv.1707.06347
Karen Simonyan & Andrew Zisserman (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv. https://doi.org/10.48550/arxiv.1409.1556
Leslie N. Smith (2015). Cyclical Learning Rates for Training Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1506.01186
Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, & Ethan Perez (2023). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. https://arxiv.org/abs/2310.13548
Mukund Sundararajan, Ankur Taly, & Qiqi Yan (2017). Axiomatic Attribution for Deep Networks. arXiv. https://doi.org/10.48550/arxiv.1703.01365
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, & Ruslan Salakhutdinov (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(1), 1929-1958. https://jmlr.org/papers/v15/srivastava14a.html
Noam Shazeer (2019). Fast Transformer Decoding: One Write-Head is All You Need. arXiv. https://doi.org/10.48550/arxiv.1911.02150
Noam Shazeer (2020). GLU Variants Improve Transformer. arXiv:2002.05202. https://arxiv.org/abs/2002.05202
Peter Shaw, Jakob Uszkoreit, & Ashish Vaswani (2018). Self-Attention with Relative Position Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 464-468. https://doi.org/10.18653/v1/n18-2074
Richard S. Sutton & Andrew G. Barto (2018). Reinforcement Learning: An Introduction (2nd edition). MIT Press. http://incompleteideas.net/book/the-book-2nd.html
Rico Sennrich, Barry Haddow, & Alexandra Birch (2016). Neural Machine Translation of Rare Words with Subword Units. Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/1508.07909
Roei Schuster, Congzheng Song, Eran Tromer, & Vitaly Shmatikov (2021). You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. USENIX Security Symposium. https://arxiv.org/abs/2007.02220
Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, & Zac Kenton (2022). Goal Misgeneralization - Why Correct Specifications Aren't Enough For Correct Goals. arXiv:2210.01790. https://arxiv.org/abs/2210.01790
Rylan Schaeffer, Brando Miranda, & Sanmi Koyejo (2023). Are Emergent Abilities of Large Language Models a Mirage?. arXiv. https://doi.org/10.48550/arxiv.2304.15004
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, & Aleksander Madry (2018). How Does Batch Normalization Help Optimization?. arXiv. https://doi.org/10.48550/arxiv.1805.11604
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, & Ben Poole (2020). Score-Based Generative Modeling through Stochastic Differential Equations. arXiv. https://doi.org/10.48550/arxiv.2011.13456
Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, & Furu Wei (2023). Retentive Network: A Successor to Transformer for Large Language Models. arXiv:2307.08621. https://arxiv.org/abs/2307.08621
T
A. M. Turing (1950). I. Computing Machinery and Intelligence. Mind, LIX(236), 433-460. https://doi.org/10.1093/mind/lix.236.433
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, & Guillaume Lample (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv. https://doi.org/10.48550/arxiv.2302.13971
Matus Telgarsky (2016). Benefits of depth in neural networks. arXiv. https://doi.org/10.48550/arxiv.1602.04485
Mingxing Tan & Quoc V. Le (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1905.11946
Robert Tibshirani (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
V
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, & Illia Polosukhin (2017). Attention Is All You Need. arXiv. https://doi.org/10.48550/arxiv.1706.03762
W
Alexander Wei, Nika Haghtalab, & Jacob Steinhardt (2023). Jailbroken: How Does LLM Safety Training Fail?. Advances in Neural Information Processing Systems 36. https://arxiv.org/abs/2307.02483
Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, & Benjamin Recht (2017). The Marginal Value of Adaptive Gradient Methods in Machine Learning. arXiv. https://doi.org/10.48550/arxiv.1705.08292
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, & William Fedus (2022). Emergent Abilities of Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2206.07682
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, & Denny Zhou (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2201.11903
Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, & Paul Christiano (2021). Recursively Summarizing Books with Human Feedback. arXiv:2109.10862. https://arxiv.org/abs/2109.10862
Joseph Weizenbaum (1976). Computer Power and Human Reason: From Judgment to Calculation. W. H. Freeman. https://archive.org/details/computerpowerhum0000weiz
Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, & Iason Gabriel (2022). Taxonomy of Risks posed by Language Models. 2022 ACM Conference on Fairness Accountability and Transparency, 214-229. https://doi.org/10.1145/3531146.3533088
Norbert Wiener (1960). Some Moral and Technical Consequences of Automation. Science. https://doi.org/10.1126/science.131.3410.1355
Paul J. Werbos (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University. https://gwern.net/doc/ai/nn/1974-werbos.pdf
Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, & Hao Ma (2020). Linformer: Self-Attention with Linear Complexity. arXiv:2006.04768. https://arxiv.org/abs/2006.04768
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, & Denny Zhou (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. International Conference on Learning Representations. https://arxiv.org/abs/2203.11171
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, & Jeffrey Dean (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144. https://arxiv.org/abs/1609.08144
X
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, & Tieyan Liu (2020). On Layer Normalization in the Transformer Architecture. International Conference on Machine Learning. https://arxiv.org/abs/2002.04745
Y
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, & Yuan Cao (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv. https://doi.org/10.48550/arxiv.2210.03629
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, & Karthik Narasimhan (2024). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Advances in Neural Information Processing Systems 36. https://arxiv.org/abs/2305.10601
Z
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, & Matt Fredrikson (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv:2307.15043. https://arxiv.org/abs/2307.15043
Biao Zhang & Rico Sennrich (2019). Root Mean Square Layer Normalization. Advances in Neural Information Processing Systems 32. https://arxiv.org/abs/1910.07467
Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, & Boaz Barak (2023). Watermarks in the Sand - Impossibility of Strong Watermarking for Generative Models. arXiv:2311.04378. https://arxiv.org/abs/2311.04378
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, & David Lopez-Paz (2017). mixup: Beyond Empirical Risk Minimization. arXiv. https://doi.org/10.48550/arxiv.1710.09412
This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.