paper/metaAI Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 44
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 44
models/misc Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 80.3k • 38 si-pbc/hertz-dev Audio-to-Audio • Updated Nov 14, 2024 • 215 microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 873 • 453
Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 80.3k • 38
paper/misc Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56 Stealing Part of a Production Language Model Paper • 2403.06634 • Published Mar 11, 2024 • 91 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
dataset/text HuggingFaceFW/fineweb Viewer • Updated Jul 11 • 52.5B • 174k • 2.57k microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 873 • 453 ai4bharat/SeamlessAlign Viewer • Updated Nov 15, 2024 • 3.01M • 1.75k • 5
paper/metaAI Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 44
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 44
paper/misc Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56 Stealing Part of a Production Language Model Paper • 2403.06634 • Published Mar 11, 2024 • 91 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16
models/misc Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 80.3k • 38 si-pbc/hertz-dev Audio-to-Audio • Updated Nov 14, 2024 • 215 microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 873 • 453
Snowflake/snowflake-arctic-embed-m-long Sentence Similarity • 0.1B • Updated Dec 13, 2024 • 80.3k • 38
dataset/text HuggingFaceFW/fineweb Viewer • Updated Jul 11 • 52.5B • 174k • 2.57k microsoft/orca-agentinstruct-1M-v1 Viewer • Updated Nov 1, 2024 • 1.05M • 873 • 453 ai4bharat/SeamlessAlign Viewer • Updated Nov 15, 2024 • 3.01M • 1.75k • 5