EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper ⢠2506.09827 ⢠Published Jun 11, 2025 ⢠21
Building and better understanding vision-language models: insights and future directions Paper ⢠2408.12637 ⢠Published Aug 22, 2024 ⢠133
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Paper ⢠2506.08967 ⢠Published Jun 10, 2025 ⢠2
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub +2 Feb 12, 2025 ⢠80
Cosmos Collection ā ļø This collection is archived. š https://huggingface.co/collections/nvidia/nvidia-cosmos-2 ⢠14 items ⢠Updated 3 days ago ⢠300
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS ⢠4 items ⢠Updated Jul 31, 2025 ⢠32