147 references
References
A
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, & Dan Mané (2016). Concrete Problems in AI Safety. arXiv. https://doi.org/10.48550/arxiv.1606.06565
David Arthur & Sergei Vassilvitskii (2007). k-means++: The Advantages of Careful Seeding. Proceedings of SODA 2007, 1027-1035. https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, & Li Zhang (2016). Deep Learning with Differential Privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308-318. https://doi.org/10.1145/2976749.2978318
B
Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer. https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/
Dzmitry Bahdanau, Kyunghyun Cho, & Yoshua Bengio (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv. https://doi.org/10.48550/arxiv.1409.0473
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, & Shmargaret Shmitchell (2021). On the Dangers of Stochastic Parrots. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922
Iz Beltagy, Matthew E. Peters, & Arman Cohan (2020). Longformer: The Long-Document Transformer. arXiv. https://doi.org/10.48550/arxiv.2004.05150
James Bergstra & Yoshua Bengio (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13, 281-305. https://jmlr.org/papers/v13/bergstra12a.html
Jimmy Lei Ba, Jamie Ryan Kiros, & Geoffrey E. Hinton (2016). Layer Normalization. arXiv. https://doi.org/10.48550/arxiv.1607.06450
Leo Breiman (1996). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/bf00058655
Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/a:1010933404324
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, & Jörg Sander (2000). LOF. Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 93-104. https://doi.org/10.1145/342009.335388
Nick Bostrom (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. https://archive.org/details/superintelligenc00unse
Piotr Bojanowski, Edouard Grave, Armand Joulin, & Tomas Mikolov (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146. https://doi.org/10.1162/tacl_a_00051
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, & Percy Liang (2021). On the Opportunities and Risks of Foundation Models. arXiv. https://doi.org/10.48550/arxiv.2108.07258
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, & Dario Amodei (2020). Language Models are Few-Shot Learners. arXiv. https://doi.org/10.48550/arxiv.2005.14165
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, & Christian Jauvin (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3, 1137-1155. https://jmlr.org/papers/v3/bengio03a.html
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, & Jared Kaplan (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv. https://doi.org/10.48550/arxiv.2212.08073
C
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, & Noah Fiedel (2022). PaLM: Scaling Language Modeling with Pathways. arXiv. https://doi.org/10.48550/arxiv.2204.02311
Alexandra Chouldechova (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153-163. https://doi.org/10.1089/big.2016.0047
Corinna Cortes & Vladimir Vapnik (1995). Support-vector networks. Machine Learning, 20(3), 273-297. https://doi.org/10.1007/bf00994018
G. Cybenko (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303-314. https://doi.org/10.1007/bf02551274
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, & Yoshua Bengio (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv. https://doi.org/10.48550/arxiv.1406.1078
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, & Alan L. Yuille (2018). DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848. https://doi.org/10.1109/tpami.2017.2699184
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, & Sergey Zagoruyko (2020). End-to-End Object Detection with Transformers. Lecture Notes in Computer Science, 213-229. https://doi.org/10.1007/978-3-030-58452-8_13
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, & Dario Amodei (2017). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arxiv.1706.03741
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, & David Duvenaud (2018). Neural Ordinary Differential Equations. arXiv. https://doi.org/10.48550/arxiv.1806.07366
T. Cover & P. Hart (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/tit.1967.1053964
Tianqi Chen & Carlos Guestrin (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785
Ting Chen, Simon Kornblith, Mohammad Norouzi, & Geoffrey Hinton (2020). A Simple Framework for Contrastive Learning of Visual Representations. arXiv. https://doi.org/10.48550/arxiv.2002.05709
D
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, & Neil Houlsby (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv. https://doi.org/10.48550/arxiv.2010.11929
Cynthia Dwork, Frank McSherry, Kobbi Nissim, & Adam Smith (2006). Calibrating Noise to Sensitivity in Private Data Analysis. Lecture Notes in Computer Science, 265-284. https://doi.org/10.1007/11681878_14
Jacob Devlin, Ming-Wei Chang, Kenton Lee, & Kristina Toutanova (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North, 4171-4186. https://doi.org/10.18653/v1/n19-1423
Laurent Dinh, Jascha Sohl-Dickstein, & Samy Bengio (2016). Density estimation using Real NVP. arXiv. https://doi.org/10.48550/arxiv.1605.08803
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, & Luke Zettlemoyer (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv. https://doi.org/10.48550/arxiv.2305.14314
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, & Christopher Ré (2022). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv. https://doi.org/10.48550/arxiv.2205.14135
E
Martin Ester, Hans-Peter Kriegel, Jörg Sander, & Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of KDD 1996, 226-231. https://cdn.aaai.org/KDD/1996/KDD96-037.pdf
F
Jerome H. Friedman (2001). Greedy function approximation: A gradient boosting machine.. The Annals of Statistics, 29(5). https://doi.org/10.1214/aos/1013203451
Jonathan Frankle & Michael Carbin (2018). The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1803.03635
William Fedus, Barret Zoph, & Noam Shazeer (2021). Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv. https://doi.org/10.48550/arxiv.2101.03961
Yoav Freund & Robert E Schapire (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1006/jcss.1997.1504
G
Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, & Yoshua Bengio (2014). Generative Adversarial Nets. Proceedings of NeurIPS 2014. https://arxiv.org/abs/1406.2661
Ross Girshick, Jeff Donahue, Trevor Darrell, & Jitendra Malik (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587. https://doi.org/10.1109/cvpr.2014.81
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, & Kate Crawford (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86-92. https://doi.org/10.1145/3458723
Xavier Glorot & Yoshua Bengio (2010). Understanding the Difficulty of Training Deep Feedforward Neural Networks. Proceedings of AISTATS 2010, 249-256. https://proceedings.mlr.press/v9/glorot10a.html
H
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, & Hartwig Adam (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv. https://doi.org/10.48550/arxiv.1704.04861
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, & Yejin Choi (2019). The Curious Case of Neural Text Degeneration. arXiv. https://doi.org/10.48550/arxiv.1904.09751
Dan Hendrycks & Kevin Gimpel (2016). Gaussian Error Linear Units (GELUs). arXiv. https://doi.org/10.48550/arxiv.1606.08415
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, & Weizhu Chen (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2106.09685
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, & Kilian Q. Weinberger (2017). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269. https://doi.org/10.1109/cvpr.2017.243
Geoffrey Hinton, Oriol Vinyals, & Jeff Dean (2015). Distilling the Knowledge in a Neural Network. arXiv. https://doi.org/10.48550/arxiv.1503.02531
Jonathan Ho, Ajay Jain, & Pieter Abbeel (2020). Denoising Diffusion Probabilistic Models. arXiv. https://doi.org/10.48550/arxiv.2006.11239
Jonathan Ho & Tim Salimans (2022). Classifier-Free Diffusion Guidance. arXiv. https://doi.org/10.48550/arxiv.2207.12598
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, & Laurent Sifre (2022). Training Compute-Optimal Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2203.15556
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv. https://doi.org/10.48550/arxiv.1502.01852
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778. https://doi.org/10.1109/cvpr.2016.90
Kurt Hornik, Maxwell Stinchcombe, & Halbert White (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366. https://doi.org/10.1016/0893-6080(89)90020-8
Sepp Hochreiter & Jürgen Schmidhuber (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
Trevor Hastie, Robert Tibshirani, & Jerome Friedman (2009). The Elements of Statistical Learning. Springer Series in Statistics. https://doi.org/10.1007/978-0-387-84858-7
I
Sergey Ioffe & Christian Szegedy (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv. https://doi.org/10.48550/arxiv.1502.03167
J
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, & Demis Hassabis (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2
K
Alex Krizhevsky, Ilya Sutskever, & Geoffrey E. Hinton (2012). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of NeurIPS 2012. https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
Andrey N. Kolmogorov (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer. https://link.springer.com/book/10.1007/978-3-642-49888-6
Diederik P Kingma & Max Welling (2013). Auto-Encoding Variational Bayes. arXiv. https://doi.org/10.48550/arxiv.1312.6114
Diederik P. Kingma & Jimmy Ba (2014). Adam: A Method for Stochastic Optimization. arXiv. https://doi.org/10.48550/arxiv.1412.6980
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, & Sepp Hochreiter (2017). Self-Normalizing Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1706.02515
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, & Dario Amodei (2020). Scaling Laws for Neural Language Models. arXiv. https://doi.org/10.48550/arxiv.2001.08361
Jon Kleinberg, Sendhil Mullainathan, & Manish Raghavan (2016). Inherent Trade-Offs in the Fair Determination of Risk Scores. arXiv. https://doi.org/10.48550/arxiv.1609.05807
Tero Karras, Samuli Laine, & Timo Aila (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4396-4405. https://doi.org/10.1109/cvpr.2019.00453
L
Fei Tony Liu, Kai Ming Ting, & Zhi-Hua Zhou (2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422. https://doi.org/10.1109/icdm.2008.17
Ilya Loshchilov & Frank Hutter (2016). SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv. https://doi.org/10.48550/arxiv.1608.03983
Ilya Loshchilov & Frank Hutter (2017). Decoupled Weight Decay Regularization. arXiv. https://doi.org/10.48550/arxiv.1711.05101
James Lighthill (1973). Artificial Intelligence: A General Survey. Science Research Council of Great Britain. https://www.chilton-computing.org.uk/inf/literature/reports/lighthill_report/contents.htm
Jonathan Long, Evan Shelhamer, & Trevor Darrell (2015). Fully convolutional networks for semantic segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965
Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, & Shimon Schocken (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6), 861-867. https://doi.org/10.1016/s0893-6080(05)80131-5
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, & Douwe Kiela (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv. https://doi.org/10.48550/arxiv.2005.11401
Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, & Peter Battaglia (2023). Learning skillful medium-range global weather forecasting. Science, 382(6677), 1416-1421. https://doi.org/10.1126/science.adi2336
Scott Lundberg & Su-In Lee (2017). A Unified Approach to Interpreting Model Predictions. arXiv. https://doi.org/10.48550/arxiv.1705.07874
Thang Luong, Hieu Pham, & Christopher D. Manning (2015). Effective Approaches to Attention-based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1412-1421. https://doi.org/10.18653/v1/d15-1166
Y. Lecun, L. Bottou, Y. Bengio, & P. Haffner (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
M
David J. C. MacKay (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. https://www.inference.org.uk/itila/
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, & Blaise Agüera y Arcas (2016). Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv. https://doi.org/10.48550/arxiv.1602.05629
John McCarthy, Marvin L. Minsky, Nathaniel Rochester, & Claude E. Shannon (1955). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. Dartmouth College. https://raysolomonoff.com/dartmouth/boxa/dart564props.pdf
Kevin P. Murphy (2022). Probabilistic Machine Learning: An Introduction. MIT Press. https://probml.github.io/pml-book/book1.html
Laurens van der Maaten & Geoffrey E. Hinton (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605. https://jmlr.org/papers/v9/vandermaaten08a.html
Leland McInnes, John Healy, & James Melville (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. https://doi.org/10.48550/arxiv.1802.03426
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, & Timnit Gebru (2019). Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229. https://doi.org/10.1145/3287560.3287596
Marvin Minsky & Seymour Papert (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. https://archive.org/details/perceptronsintro00mins
Tomas Mikolov, Kai Chen, Greg Corrado, & Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. arXiv. https://doi.org/10.48550/arxiv.1301.3781
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, & Demis Hassabis (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533. https://doi.org/10.1038/nature14236
N
Vinod Nair & Geoffrey E. Hinton (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of ICML 2010, 807-814. https://icml.cc/Conferences/2010/papers/432.pdf
O
Aaron van den Oord, Oriol Vinyals, & Koray Kavukcuoglu (2017). Neural Discrete Representation Learning. arXiv. https://doi.org/10.48550/arxiv.1711.00937
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, & Ryan Lowe (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arxiv.2203.02155
P
Jeffrey Pennington, Richard Socher, & Christopher Manning (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543. https://doi.org/10.3115/v1/d14-1162
John C. Platt (1998). Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Advances in Kernel Methods. https://doi.org/10.7551/mitpress/1130.003.0016
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, & Luke Zettlemoyer (2018). Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227-2237. https://doi.org/10.18653/v1/n18-1202
Ofir Press, Noah A. Smith, & Mike Lewis (2021). Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. arXiv. https://doi.org/10.48550/arxiv.2108.12409
R
Alec Radford, Luke Metz, & Soumith Chintala (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv. https://doi.org/10.48550/arxiv.1511.06434
Alec Radford, Karthik Narasimhan, Tim Salimans, & Ilya Sutskever (2018). Improving Language Understanding by Generative Pre-Training. OpenAI Technical Report. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, & Ilya Sutskever (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, & Ilya Sutskever (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv. https://doi.org/10.48550/arxiv.2103.00020
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, & Peter J. Liu (2019). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv. https://doi.org/10.48550/arxiv.1910.10683
Cynthia Rudin (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215. https://doi.org/10.1038/s42256-019-0048-x
David E. Rumelhart, Geoffrey E. Hinton, & Ronald J. Williams (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0
F. Rosenblatt (1958). The perceptron: A probabilistic model for information storage and organization in the brain.. Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519
Herbert Robbins & Sutton Monro (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3), 400-407. https://doi.org/10.1214/aoms/1177729586
Joseph Redmon, Santosh Divvala, Ross Girshick, & Ali Farhadi (2016). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. https://doi.org/10.1109/cvpr.2016.91
Marco Tulio Ribeiro, Sameer Singh, & Carlos Guestrin (2016). "Why Should I Trust You?". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144. https://doi.org/10.1145/2939672.2939778
Olaf Ronneberger, Philipp Fischer, & Thomas Brox (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
Prajit Ramachandran, Barret Zoph, & Quoc V. Le (2017). Searching for Activation Functions. arXiv. https://doi.org/10.48550/arxiv.1710.05941
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, & Chelsea Finn (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv. https://doi.org/10.48550/arxiv.2305.18290
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, & Bjorn Ommer (2022). High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10674-10685. https://doi.org/10.1109/cvpr52688.2022.01042
Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv. https://doi.org/10.48550/arxiv.1506.01497
Stuart Russell & Peter Norvig (2020). Artificial Intelligence: A Modern Approach. Pearson, 4th edition. https://aima.cs.berkeley.edu/
S
C. E. Shannon (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, & Rob Fergus (2013). Intriguing properties of neural networks. arXiv. https://doi.org/10.48550/arxiv.1312.6199
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, & Andrew Rabinovich (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9. https://doi.org/10.1109/cvpr.2015.7298594
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, & Zbigniew Wojna (2016). Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2818-2826. https://doi.org/10.1109/cvpr.2016.308
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, & Dan Dennison (2015). Hidden Technical Debt in Machine Learning Systems. Proceedings of NeurIPS 2015. https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, & Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489. https://doi.org/10.1038/nature16961
Ilya Sutskever, Oriol Vinyals, & Quoc V. Le (2014). Sequence to Sequence Learning with Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1409.3215
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, & Yunfeng Liu (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv. https://doi.org/10.48550/arxiv.2104.09864
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, & Oleg Klimov (2017). Proximal Policy Optimization Algorithms. arXiv. https://doi.org/10.48550/arxiv.1707.06347
Karen Simonyan & Andrew Zisserman (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv. https://doi.org/10.48550/arxiv.1409.1556
Leslie N. Smith (2015). Cyclical Learning Rates for Training Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1506.01186
Mukund Sundararajan, Ankur Taly, & Qiqi Yan (2017). Axiomatic Attribution for Deep Networks. arXiv. https://doi.org/10.48550/arxiv.1703.01365
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, & Ruslan Salakhutdinov (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(1), 1929-1958. https://jmlr.org/papers/v15/srivastava14a.html
Noam Shazeer (2019). Fast Transformer Decoding: One Write-Head is All You Need. arXiv. https://doi.org/10.48550/arxiv.1911.02150
Peter Shaw, Jakob Uszkoreit, & Ashish Vaswani (2018). Self-Attention with Relative Position Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 464-468. https://doi.org/10.18653/v1/n18-2074
Richard S. Sutton & Andrew G. Barto (2018). Reinforcement Learning: An Introduction (2nd edition). MIT Press. http://incompleteideas.net/book/the-book-2nd.html
Rylan Schaeffer, Brando Miranda, & Sanmi Koyejo (2023). Are Emergent Abilities of Large Language Models a Mirage?. arXiv. https://doi.org/10.48550/arxiv.2304.15004
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, & Aleksander Madry (2018). How Does Batch Normalization Help Optimization?. arXiv. https://doi.org/10.48550/arxiv.1805.11604
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, & Ben Poole (2020). Score-Based Generative Modeling through Stochastic Differential Equations. arXiv. https://doi.org/10.48550/arxiv.2011.13456
T
A. M. TURING (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. Mind, LIX(236), 433-460. https://doi.org/10.1093/mind/lix.236.433
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, & Guillaume Lample (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv. https://doi.org/10.48550/arxiv.2302.13971
Matus Telgarsky (2016). Benefits of depth in neural networks. arXiv. https://doi.org/10.48550/arxiv.1602.04485
Mingxing Tan & Quoc V. Le (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv. https://doi.org/10.48550/arxiv.1905.11946
Robert Tibshirani (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
V
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, & Illia Polosukhin (2017). Attention Is All You Need. arXiv. https://doi.org/10.48550/arxiv.1706.03762
W
Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, & Benjamin Recht (2017). The Marginal Value of Adaptive Gradient Methods in Machine Learning. arXiv. https://doi.org/10.48550/arxiv.1705.08292
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, & William Fedus (2022). Emergent Abilities of Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2206.07682
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, & Denny Zhou (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv. https://doi.org/10.48550/arxiv.2201.11903
Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, & Iason Gabriel (2022). Taxonomy of Risks posed by Language Models. 2022 ACM Conference on Fairness Accountability and Transparency, 214-229. https://doi.org/10.1145/3531146.3533088
Paul J. Werbos (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University. https://gwern.net/doc/ai/nn/1974-werbos.pdf
Y
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, & Yuan Cao (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv. https://doi.org/10.48550/arxiv.2210.03629
Z
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, & David Lopez-Paz (2017). mixup: Beyond Empirical Risk Minimization. arXiv. https://doi.org/10.48550/arxiv.1710.09412
© 2026 Chris Paton. All rights reserved.