Abstract. Combines 4-bit quantisation of a frozen base model with LoRA adapters, enabling fine-tuning of 65-billion-parameter language models on a single consumer GPU while preserving 16-bit full-fine-tuning performance.
Tags:efficiencyfine-tuningquantisationlora
This site is currently in Beta. Contact: Chris Paton