OpenAssistant Conversations (OASST1, Köpf, Kilcher, von Rütte et al., NeurIPS 2023, arXiv:2304.07327) is the largest community-built conversational instruction-tuning dataset, released by LAION in April 2023. The follow-up OASST2 appeared in March 2024.
Construction
OpenAssistant recruited over 13,500 volunteer contributors worldwide through the openassistant.io web interface. Contributors worked through five role-defined tasks:
- Prompt, write a new conversation-starter.
- Reply as assistant, write what an ideal assistant should say.
- Reply as user, continue a conversation as a user.
- Label, classify a message on quality, hate-speech, PII, creativity dimensions.
- Rank, order multiple replies by preference.
Each contribution was peer-reviewed by other contributors, and conversations are organised as trees, a single prompt may have many continuations and many parallel replies, with quality and ranking signals on each branch.
OASST1
Released April 2023 with 161,443 messages in 35 languages (English dominant at 56%, German 20%, Spanish 7%, French, Russian, Portuguese, Italian, Dutch, Polish and others), forming 66,497 conversation trees with 461,292 quality ratings. The cleaned subset for fine-tuning contains roughly 9,800 high-quality conversation trees.
OASST2
Released March 2024 with 135,000 additional messages focused on multilingual coverage (over 50 languages, with substantial expansion of low-resource European languages and a Mandarin contribution).
Licensing
Released under Apache-2.0 with contributor rights documented in detail. OASST is one of the few large instruction-tuning datasets with a clear, permissively licensed audit trail of contributor consent.
Models trained on OpenAssistant
OASST trained the OpenAssistant model family (LLaMA, Falcon, Pythia variants), several MPT instruction-tuned variants, OpenChat, Zephyr-7B-α (in part), and many academic instruction-following models. It is also a standard component of community fine-tuning mixtures alongside ShareGPT and UltraChat.
Limitations
The crowdsourced annotation introduces high variance: average reply quality is acceptable but the long tail includes low-effort contributions. Self-reference issues are common, contributors writing assistant replies sometimes parrot back the prompt or copy from contemporary chatbots. Evaluation contamination has been observed where contributors copied benchmark questions verbatim. Despite this, OASST is the largest fully transparent instruction-tuning resource and a standard component of any open RLHF or DPO recipe.
Related terms: Anthropic HH-RLHF, ShareGPT and Vicuna, UltraChat and UltraFeedback, RLHF
Discussed in:
- Chapter 14: Generative Models, Alignment and RLHF