AndreasXi
/

Resonate

Model card Files Files and versions

xet

Community

AndreasXi commited on Mar 11

Commit

7da0d6b

verified ·

1 Parent(s): 1a901fe

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +16 -13
music_speech_audioset_epoch_15_esc_89.98.pt +3 -0

README.md CHANGED Viewed

@@ -6,29 +6,33 @@ license: mit
 <p align="center">
   <h2>Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models</h2>
   <!-- <a href=>Paper</a> | <a href="https://meanaudio.github.io/">Webpage</a>  -->
-  <!-- [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.06098)
   [![Code](https://img.shields.io/badge/Code-Repo-black?style=flat&logo=github&logoColor=white)](https://github.com/xiquan-li/MeanAudio?tab=readme-ov-file)
   [![Hugging Face Model](https://img.shields.io/badge/Model-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/AndreasXi/MeanAudio)
   [![Hugging Face Space](https://img.shields.io/badge/Space-HuggingFace-blueviolet?logo=huggingface)](https://huggingface.co/spaces/chenxie95/MeanAudio)
   [![Webpage](https://img.shields.io/badge/Website-Visit-orange?logo=googlechrome&logoColor=white)](https://meanaudio.github.io/) -->
 </p>
 </div>
 ## Overview
 Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm.
-This repo provides a comprehensive pipeline for audio synthesis, covering Pre-training, SFT, DPO, and GRPO.
 ## Environmental Setup
-**1. Create a new conda environment:**
 ```bash
-conda create -n meanaudio python=3.11 -y
-conda activate meanaudio
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade
 ```
 <!-- ```
@@ -36,12 +40,12 @@ conda install -c conda-forge 'ffmpeg<7
 ```
 (Optional, if you use miniforge and don't already have the appropriate ffmpeg) -->
-**2. Install with pip:**
 ```bash
-git clone https://github.com/xiquan-li/MeanAudio.git
-cd MeanAudio
 pip install -e .
 ```
@@ -53,9 +57,8 @@ pip install -e .
 <!-- **1. Download pre-trained models:** -->
 To generate audio with our pre-trained model, simply run:
 ```bash
-python demo.py --prompt 'your prompt' --num_steps 1
 ```
 This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt.
-The output audio will be at `MeanAudio/output/`, and the checkpoints will be at `MeanAudio/weights/`.
-Have fun with MeanAudio 😊 !!!

 <p align="center">
   <h2>Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models</h2>
   <!-- <a href=>Paper</a> | <a href="https://meanaudio.github.io/">Webpage</a>  -->
+<!--
+  [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.06098)
   [![Code](https://img.shields.io/badge/Code-Repo-black?style=flat&logo=github&logoColor=white)](https://github.com/xiquan-li/MeanAudio?tab=readme-ov-file)
   [![Hugging Face Model](https://img.shields.io/badge/Model-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/AndreasXi/MeanAudio)
   [![Hugging Face Space](https://img.shields.io/badge/Space-HuggingFace-blueviolet?logo=huggingface)](https://huggingface.co/spaces/chenxie95/MeanAudio)
   [![Webpage](https://img.shields.io/badge/Website-Visit-orange?logo=googlechrome&logoColor=white)](https://meanaudio.github.io/) -->
+[![Code](https://img.shields.io/badge/Code-Repo-black?style=flat&logo=github&logoColor=white)](https://github.com/xiquan-li/Resonate)
+[![Hugging Face Model](https://img.shields.io/badge/Model-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/AndreasXi/Resonate)
+[![Webpage](https://img.shields.io/badge/Website-Visit-orange?logo=googlechrome&logoColor=white)](https://resonatedemo.github.io/)
 </p>
 </div>
 ## Overview
 Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm.
+It leverages the sophisticated reasoning capabilities of modern Large Audio Language Models as reward models.
+This repo provides a comprehensive pipeline for audio generation, covering Pre-training, SFT, DPO, and GRPO.
 ## Environmental Setup
+1. Create a new conda environment:
 ```bash
+conda create -n resonate python=3.11 -y
+conda activate resonate
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade
 ```
 <!-- ```
 ```
 (Optional, if you use miniforge and don't already have the appropriate ffmpeg) -->
+2. Install with pip:
 ```bash
+git clone https://github.com/xiquan-li/Resonate.git
+cd Resonate
 pip install -e .
 ```
 <!-- **1. Download pre-trained models:** -->
 To generate audio with our pre-trained model, simply run:
 ```bash
+python demo.py --prompt 'your prompt'
 ```
 This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt.
+By default, this will use [Resonate-GRPO](https://huggingface.co/AndreasXi/Resonate/blob/main/Resonate_GRPO.pth).
+The output audio will be at `Resonate/output/`, and the checkpoints will be at `Resonate/weights/`.

music_speech_audioset_epoch_15_esc_89.98.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51c68f12f9d7ea25fdaaccf741ec7f81e93ee594455410f3bca4f47f88d8e006
+size 2352471003