Papers
updated
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 626
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published • 189
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published • 56
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper
• 2402.18039
• Published • 11
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published • 53
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
• 2403.03853
• Published • 66
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper
• 2402.19479
• Published • 35
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
• 2402.09353
• Published • 32
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Paper
• 2402.16828
• Published • 4
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published • 150
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published • 129
Simple and Scalable Strategies to Continually Pre-train Large Language
Models
Paper
• 2403.08763
• Published • 51
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
• 2402.04617
• Published • 6
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published • 94
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published • 90
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published • 66
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models
Paper
• 2403.03432
• Published • 1
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published • 259