AndreasXi commited on
Commit
7da0d6b
·
verified ·
1 Parent(s): 1a901fe

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -6,29 +6,33 @@ license: mit
6
  <p align="center">
7
  <h2>Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models</h2>
8
  <!-- <a href=>Paper</a> | <a href="https://meanaudio.github.io/">Webpage</a> -->
9
-
10
- <!-- [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.06098)
11
  [![Code](https://img.shields.io/badge/Code-Repo-black?style=flat&logo=github&logoColor=white)](https://github.com/xiquan-li/MeanAudio?tab=readme-ov-file)
12
  [![Hugging Face Model](https://img.shields.io/badge/Model-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/AndreasXi/MeanAudio)
13
  [![Hugging Face Space](https://img.shields.io/badge/Space-HuggingFace-blueviolet?logo=huggingface)](https://huggingface.co/spaces/chenxie95/MeanAudio)
14
  [![Webpage](https://img.shields.io/badge/Website-Visit-orange?logo=googlechrome&logoColor=white)](https://meanaudio.github.io/) -->
15
 
16
 
 
 
 
17
  </p>
18
  </div>
19
 
20
 
21
  ## Overview
22
  Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm.
23
- This repo provides a comprehensive pipeline for audio synthesis, covering Pre-training, SFT, DPO, and GRPO.
 
24
 
25
  ## Environmental Setup
26
 
27
- **1. Create a new conda environment:**
28
 
29
  ```bash
30
- conda create -n meanaudio python=3.11 -y
31
- conda activate meanaudio
32
  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade
33
  ```
34
  <!-- ```
@@ -36,12 +40,12 @@ conda install -c conda-forge 'ffmpeg<7
36
  ```
37
  (Optional, if you use miniforge and don't already have the appropriate ffmpeg) -->
38
 
39
- **2. Install with pip:**
40
 
41
  ```bash
42
- git clone https://github.com/xiquan-li/MeanAudio.git
43
 
44
- cd MeanAudio
45
  pip install -e .
46
  ```
47
 
@@ -53,9 +57,8 @@ pip install -e .
53
  <!-- **1. Download pre-trained models:** -->
54
  To generate audio with our pre-trained model, simply run:
55
  ```bash
56
- python demo.py --prompt 'your prompt' --num_steps 1
57
  ```
58
  This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt.
59
- The output audio will be at `MeanAudio/output/`, and the checkpoints will be at `MeanAudio/weights/`.
60
-
61
- Have fun with MeanAudio 😊 !!!
 
6
  <p align="center">
7
  <h2>Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models</h2>
8
  <!-- <a href=>Paper</a> | <a href="https://meanaudio.github.io/">Webpage</a> -->
9
+ <!--
10
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.06098)
11
  [![Code](https://img.shields.io/badge/Code-Repo-black?style=flat&logo=github&logoColor=white)](https://github.com/xiquan-li/MeanAudio?tab=readme-ov-file)
12
  [![Hugging Face Model](https://img.shields.io/badge/Model-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/AndreasXi/MeanAudio)
13
  [![Hugging Face Space](https://img.shields.io/badge/Space-HuggingFace-blueviolet?logo=huggingface)](https://huggingface.co/spaces/chenxie95/MeanAudio)
14
  [![Webpage](https://img.shields.io/badge/Website-Visit-orange?logo=googlechrome&logoColor=white)](https://meanaudio.github.io/) -->
15
 
16
 
17
+ [![Code](https://img.shields.io/badge/Code-Repo-black?style=flat&logo=github&logoColor=white)](https://github.com/xiquan-li/Resonate)
18
+ [![Hugging Face Model](https://img.shields.io/badge/Model-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/AndreasXi/Resonate)
19
+ [![Webpage](https://img.shields.io/badge/Website-Visit-orange?logo=googlechrome&logoColor=white)](https://resonatedemo.github.io/)
20
  </p>
21
  </div>
22
 
23
 
24
  ## Overview
25
  Reosnate is a SOTA text-to-audio generator reinforced with online GRPO algorithm.
26
+ It leverages the sophisticated reasoning capabilities of modern Large Audio Language Models as reward models.
27
+ This repo provides a comprehensive pipeline for audio generation, covering Pre-training, SFT, DPO, and GRPO.
28
 
29
  ## Environmental Setup
30
 
31
+ 1. Create a new conda environment:
32
 
33
  ```bash
34
+ conda create -n resonate python=3.11 -y
35
+ conda activate resonate
36
  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade
37
  ```
38
  <!-- ```
 
40
  ```
41
  (Optional, if you use miniforge and don't already have the appropriate ffmpeg) -->
42
 
43
+ 2. Install with pip:
44
 
45
  ```bash
46
+ git clone https://github.com/xiquan-li/Resonate.git
47
 
48
+ cd Resonate
49
  pip install -e .
50
  ```
51
 
 
57
  <!-- **1. Download pre-trained models:** -->
58
  To generate audio with our pre-trained model, simply run:
59
  ```bash
60
+ python demo.py --prompt 'your prompt'
61
  ```
62
  This will automatically download the pre-trained checkpoints from huggingface, and generate audio according to your prompt.
63
+ By default, this will use [Resonate-GRPO](https://huggingface.co/AndreasXi/Resonate/blob/main/Resonate_GRPO.pth).
64
+ The output audio will be at `Resonate/output/`, and the checkpoints will be at `Resonate/weights/`.
 
music_speech_audioset_epoch_15_esc_89.98.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51c68f12f9d7ea25fdaaccf741ec7f81e93ee594455410f3bca4f47f88d8e006
3
+ size 2352471003