Architectures
updated
Associative Recurrent Memory Transformer
Paper
• 2407.04841
• Published
• 35
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper
• 2406.04692
• Published
• 59
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality
Paper
• 2405.21060
• Published
• 68
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper
• 2406.16860
• Published
• 63
Kolmogorov-Arnold Transformer
Paper
• 2409.10594
• Published
• 45
Paper
• 2410.05258
• Published
• 181
Selective Attention Improves Transformer
Paper
• 2410.02703
• Published
• 25