Running on CPU Upgrade Featured 2.76k The Smol Training Playbook 📚 2.76k The secrets to building world-class LLMs
Running 3.62k The Ultra-Scale Playbook 🌌 3.62k The ultimate guide to training LLM on large GPU Clusters
deepseek-ai/DeepSeek-R1-0528 Text Generation • 685B • Updated May 29, 2025 • 340k • • 2.39k
Alibaba-NLP/gte-Qwen1.5-7B-instruct Sentence Similarity • 8B • Updated Jan 11, 2025 • 765 • 108
Salesforce/SFR-Embedding-Code-400M_R Feature Extraction • 0.4B • Updated Jan 24, 2025 • 12.7k • 34
Alibaba-NLP/gte-modernbert-base Sentence Similarity • 0.1B • Updated Jul 4, 2025 • 38.1k • • 186
R3GAN Collection R3GAN: A Modern BaselineGAN https://github.com/brownvc/R3GAN/ https://arxiv.org/abs/2501.05441 • 7 items • Updated Jan 10, 2025 • 10
nomic-ai/modernbert-embed-base-unsupervised Sentence Similarity • 0.1B • Updated Dec 30, 2024 • 503 • 10
nomic-ai/modernbert-embed-base Sentence Similarity • 0.1B • Updated Jan 24, 2025 • 82.6k • • 223
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated Jan 6, 2025 • 27
Long Context RAG Performance of Large Language Models Paper • 2411.03538 • Published Nov 5, 2024 • 1