Audio Spaces
- π71
-
Seamless M4T
π949 -
MusicGen
π΅5.07kGenerate music from text descriptions and optional melodies
-
Audioldm Text To Audio Generation
π815Generate audio from text descriptions
-
AudioLDM2 Text2Audio Text2Music Generation
π307Generate audio and waveform video from text
-
AudioSep
π222 -
Lp Music Caps
π΅172Generate captions for music audio
-
Tortoise Tts
π’315ExpressivText-to-Speech
-
All In One
π22 -
XTTS
πΈ2.77kGenerate speech from text using a reference voice
-
Coqui Bark Voice Cloning
πΈ189 -
VALL E X
π365Generate audio from text using voice prompts
-
WavJourney
π₯193 -
Music To Image
πΆ263 -
MMS
π277Transform and identify speech with MMS
-
ElevenLabs TTS
π£620Generate spoken audio from text using selectable voices
-
AudioGPT
π288 -
Bark
πΆ2.37kGenerate realistic audio from text
-
SpeechT5 Speech Recognition Demo
π©37 -
CoquiTTS (Official)
πΈ172 -
Whisper
π2.76kTranscribe audio files into text
-
Moe TTS
π667Generate and convert voice using text and audio inputs
-
YourTTS
π₯17 -
Talking Face Generation with Multilingual TTS
π560Generate multilingual talking-face videos from your text
-
OpenAI TTS New
π560 -
Mustango
π’166 -
OWSM Demo
π55 -
StyleTTS 2
π£725Efficient, fast, and natural text to speech with StyleTTS 2!
-
HierSpeech++ (Zero-shot TTS)
β‘396Generate high-quality speech from text using a prompt audio
-
Video2music
π21Generate music for a video based on its content and key
-
Whisper Large V2
π€«187 -
Musicgen Prompt Upsampling
π64Generate music from text prompts πΆ
-
Seamless M4T v2
π516Translate speech and text between languages
-
Seamless Streaming
π326Translate text between languages
-
Matcha TTS
π΅53Generate speech from text with speaker selection
-
MusicGen Streaming
π₯293Generate music from text descriptions in real-time
-
Resemble Enhance
π459Enhance and denoise your audio files
-
Singing Voice Conversion
πΌ262Transform your voice into a singer's
-
NaturalSpeech2
π§52Generate speech with cloned timbre
-
Create Your Own TTS Dataset
π₯22 -
Podcast Transcription
π’ -
OpenVoice
π€1.13kGenerate speech in a cloned voice from a short audio sample
-
M2UGen Demo
π»94 -
Pheme
π68 -
ESPnet2 TTS
π6Convert text to speech in English, Chinese, or Japanese
-
Whisper-WebUI
π41Generate subtitles and translate audio files
-
Image2SFX Comparison
π177Generates audio environment from an image
-
WhisperSpeech
π¬379 -
MetaVoice 1B
π£144A demo of MetaVoice 1B, a new TTS model by MetaVoice.
-
TTS Arena V2
π943Vote on the latest TTS models!
-
Whisper Speech X DreamTalk
π½179Combine voice cloning and portrait lipsync animation
-
Canary 1b
π€197Transcribe and translate audio into text
-
SALMONN Audio Questioning
β‘83Deeply interrogate audio file content
-
MeloTTS
π£473Fast, efficient, & multilingual text-to-speech
-
Audio Editing
π§327Edit audios with text prompts
-
ChatMusician
π»18 -
xVASynth TTS
π§73CPU powered, low RTF, emotional, multilingual TTS
-
NaturalSpeech3 FACodec
π178Convert and reconstruct speech files
-
Hey Gemma
β25 -
Ratchet + Whisper
π£70Convert audio to text
-
AutoSubs
π3Automatically add on-screen subs to your videos
-
VoiceCraft
π161 -
TangoFlux
π325Text to Audio (Sound SFX) Generator
-
Parler-TTS
π₯846High-fidelity Text-To-Speech
-
Sing an idea β‘οΈ Music
π₯184Bring song ideas to life
-
Musicgen Songstarter Demo
π75Generate music using descriptions and optional melody audio
-
Whisper JAX
π145Transcribe or translate audio from microphone, file, or YouTube
-
AudioLCM
π’23Generate audio from text
-
Stable Audio Live Multiplayer
π»163Generate custom audio from text prompts
-
Stable Audio Open Zero
π₯464Generate immersive audio from text prompts
-
Make An Audio 3
π14Generate audio from text prompts
-
Mars5 Space
π60 -
Tango Music AF
π΅5Text to Music Generator
-
Jam
π16Generate a song from lyrics and style reference
-
BigVGAN
π114Generate highβquality audio from your input file with BigVGAN
-
SenseVoice
π90Transcribe audio with emotions and events
-
PicoAudio
π27Generate audio from text descriptions with timestamps
-
Audio Flamingo Demo
π7 -
MusiConGen
πͺ©29 -
Mms Zeroshot
π20Transcribe audio in any language using text data
-
GPT SoVITS V2 Pro Plus
π€230Generate speech from text using a reference voice
-
EzAudio
π£275Generate or edit realistic audio from text prompts
-
OpenMusic
πΆ214Generate music from text descriptions
-
Midi Music Generator
πΌ572Generate MIDI music from prompts
-
Whisper Turbo
π€―1.01kTranscribe or translate audio and YouTube videos to text
-
Realtime Whisper Turbo
π€―346Realtime implementation of Whisper large turbo
-
Whisper Large V3 Turbo WebGPU
π170ML-powered speech recognition directly in your browser
-
Fish Audio S1
π696Convert text to natural-sounding speech audio
-
TTS Spaces Arena
π€479Blind vote on HF TTS models!
-
Diva Realtime Chat
π£19Generate text responses from audio input
-
F5-TTS
π£2.84kF5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
-
MaskGCT TTS Demo
π»259MaskGCT TTS Demo
-
MelodyFlow
π΅159Generate or edit music from text and optional audio
-
Fish Agent
π¬148An end-to-end (e2e) Voice Language Model by Fish Audio.
-
Nexa Omni Demo
π§66Generate text from uploaded or recorded audio
-
Kokoro TTS
β€3.26kUpgraded to v1.0!
-
Make Custom Voices With KokoroTTS
β‘133Make Custom Voices With KokoroTTS
-
Llasa 3b Tts
π₯313Zero Shot voice cloning with llasa 3b (Unofficial Demo)
-
Llasa 1b Multilingual TTS
π12Generate speech from text with or without cloning a voice
-
Kokoro Text-to-Speech (WebGPU)
π£354High-quality speech synthesis powered by Kokoro TTS
-
Hibiki Simple
π42High-Fidelity Simultaneous Speech-To-Speech Translation
-
Zonos
π413Generate expressive speech audio from text with custom voice
-
Kokoro Web
π£82ML-powered speech synthesis directly in your browser
-
DiβͺβͺRhythm
πΆ684Blazingly Fast and Embarrassingly Simple Song Generation
-
Audiobox Aesthetics
π23Demo for audiobox-aesthetics
-
Spark TTS
π229A text-to-speech model powered by SparkAudio and Mobvoi.
-
Sesame CSM
π±862Conversational speech generation
-
Orpheus TTS
π246Try Orpheus TTS here
-
Canary 1B Flash
π€43Canary 1B Flash demo
-
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
π216Generate speech from text using a reference audio
-
AudioMorphix
π6Prepare environment and run Gradio app
-
MegaTTS3 Demo
π93 -
AudioX
π166Generate audio from text, video, or audio prompts
-
Vevo for Zero-shot VC, TTS, and More
π100Controllable Zero-Shot Voice Imitation
-
Dia 1.6B
π―1.76kGenerate realistic dialogue from a script, using Dia!
-
Aero 1 Audio Demo
π¬43Demo for Aero-1-Audio
-
Voila Demo
π»44Chat with a voice-clone AI
-
ACE Step
π»654A Step Towards Music Generation Foundation Model
-
Audio Difficulty Estimator
πΉ2Estimate piano difficulty from audio
-
TIGER Audio Extractor
β113Extraction & Reconstruction for Efficient Speech Separation
-
Music2emo
π18Towards Unified Music Emotion Recognition across Dimensional
-
SonicVerse
πΌ13Generate detailed music descriptions from audio clips
-
Auffusion
π»44Audio Gen, Audio Style Transfer and Audio InPainting
-
Chatterbox TTS
πΏ1.73kExpressive Zeroshot TTS
-
PlayDiffusion
π¨119Generate modified audio from text and voice
-
Voice Clone Arena
π2Vote on the latest Voice Clone TTS models!
-
Conversational WebGPU
π233 -
Song Generation
π΅697Generate a song from your lyrics and description
-
NotaGen
π72Generate classical sheet music in ABC notation
-
Audio Flamingo 3 Demo
π100Audio Flamingo 3 Demo
-
Audio Flamingo 3 Chat
π33Audio Flamingo 3 demo for multi-turn multi-audio chat
-
MSR UTMOS
π’6Multiple sampling rate MOS prediction with SFI conv
-
Higgs Audio Demo
π€398Higgs Audio Demo
-
sidon_demo_beta
π27Speech restoration demo of Sidon.
-
Canary 1b V2
π€68Transcribe and Translate in 25 European Languages
-
SonicMaster β Text-Guided Music Restoration & Mastering
π§28Enhance audio quality using text prompts
-
OLMoASR
π6Open Models and Data for Training Robust Speech Recognition
-
VibeVoice-Large
π85Generate a podcast audio from a script and voice samples
-
TaDiCodec TTS AR Qwen2.5 0.5B
π10Generate speech from text with voice cloning
-
EchoX
π₯8An end-to-end speech large language model.
-
VoxCPM 0.5B
π’44Generate expressive speech from text with optional voice cloning
-
FireRedTTS2
π₯34Long-form multi-speaker dialogue generation
-
FireRedASR
π12FireRedASR Demo
-
IndexTTS 2 Demo
π’782Generate expressive speech audio from text with emotion control
-
SongFormer
π΅20State-of-the-art music analysis with multi-scale datasets
-
Voice Acting TTS
π26TTS for any emotion, now with non-verbal sounds!
-
Omnilingual ASR Media Transcription
π238Transcribe audio/video files into text instantly
-
Music Flamingo
π΅161Ask questions about any song and get detailed answers
-
Maya1
π118Demo of our new open source model maya1
-
Supertonic (TTS)
β‘217Lightning-Fast, On-Device TTS
-
Dia2 2B
π¨74Streaming conversational audio in realtime
-
VibeVoice-Realtime-0.5B
π¨181Generate natural speech from text with selectable voices
-
Count The Notes
π΅1Convert audio to MIDI
-
SpeechJudge GRM
π1Evaluate naturalness of two audio files
-
Chatterbox Turbo Demo
β‘489Chatterbox Turbo Demo
-
Soprano TTS
π£147Now with upgraded v1.1 model!
-
Qwen3-TTS Demo
π1.81kGenerate speech from text with custom voice, cloning, or presets
-
Qwen3-ASR Demo
π121Transcribe audio to text with multi-language timestamps
-
Voxtral Mini Realtime
π€155Transcribe speech to text instantly in real time
-
ACE-Step v1.5
π΅466Music Generation Foundation Model v1.5
-
Parakeet STT Progressive Transcription
π€88Transcribe speech to text instantly with WebGPU acceleration
-
faster-qwen3-tts
π205Generate speech audio from text with custom or cloned voices
-
Fish Audio S2 Pro
π133Zero GPU Text-to-Speech using Fish Audio S2 Pro
-
TADA
π΅85Generate speech that mimics a given voice from text
-
Voxtral Realtime WebGPU
π¬100Real-time speech transcription, entirely in your browser.
-
TADA β Text-Acoustic Dual Alignment for Speech
π―22Speech generation from text and acoustic reference
-
Foundation 1
π60Generate custom music clips from text prompts
-
LongCat AudioDiT 3.5B
π±6Generate speech from text or clone a voice using a sample