nielsr HF Staff commited on
Commit
0fc43bb
·
verified ·
1 Parent(s): c7cdc8d

Enhance model card with metadata, links, overview, and usage examples

Browse files

This PR significantly improves the model card for **Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D** by:

- **Adding relevant metadata**:
- Setting `pipeline_tag: image-to-3d` for better discoverability on the Hugging Face Hub.
- Retaining the existing `license: mit`.
- **Enriching the content**:
- Adding direct links to the paper ([Feedforward 3D Editing via Text-Steerable Image-to-3D](https://huggingface.co/papers/2512.13678)), project page ([https://glab-caltech.github.io/steer3d/](https://glab-caltech.github.io/steer3d/)), and GitHub repository ([https://github.com/ziqi-ma/Steer3D](https://github.com/ziqi-ma/Steer3D)).
- Including a concise overview of the model's capabilities and architecture.
- Incorporating the teaser image from the project to visually represent the model.
- Providing detailed environment setup instructions and comprehensive sample usage commands for "Inference in the Wild" directly from the GitHub README, demonstrating removal, texture, and addition edits.
- Adding the official BibTeX citation.

These updates aim to provide users with a more informative and actionable model card.

Files changed (1) hide show
  1. README.md +90 -1
README.md CHANGED
@@ -1,4 +1,93 @@
1
  ---
2
  license: mit
 
3
  ---
4
- Checkpoints of "Feedforward 3D Editing via Text-Steerable Image-to-3D" (https://glab-caltech.github.io/steer3d/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: image-to-3d
4
  ---
5
+
6
+ # Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D
7
+
8
+ [Paper](https://huggingface.co/papers/2512.13678) | [Project Page](https://glab-caltech.github.io/steer3d/) | [Code](https://github.com/ziqi-ma/Steer3D)
9
+
10
+ Steer3D is a novel feedforward method that introduces text steerability to image-to-3D models, enabling the editing of generated 3D assets using natural language instructions. Inspired by ControlNet, this approach adapts the architecture for image-to-3D generation, facilitating direct text steering in a forward pass. Steer3D demonstrates faithful adherence to language instructions and maintains better consistency with the original 3D asset, while being significantly faster than competing methods.
11
+
12
+ ![teaser](https://github.com/ziqi-ma/Steer3D/raw/main/media/teaser.png?raw=true)
13
+
14
+ ## Overview
15
+ Steer3D adapts the ControlNet architecture to add text steerability to image-to-3D models. It is trained on a 100k-scale synthetic dataset generated by a custom data engine. This project shares code for both the data engine and the model, with scripts for various steps explained in the `dataengine/README.md` within the GitHub repository.
16
+
17
+ ## Environment Setup
18
+ To set up the environment for the model, please follow the instructions below. Note that the data engine requires a separate environment setup, detailed in `dataengine/README.md`.
19
+
20
+ ```bash
21
+ conda env create -f environment.yml
22
+ conda activate steer3d
23
+ ```
24
+
25
+ Libraries such as `kaolin`, `nvdiffrast`, `diffoctreerast`, `mip-splatting`, and `vox2seq` may require manual installation. Refer to the [setup script from TRELLIS](https://github.com/microsoft/TRELLIS/blob/main/setup.sh) for guidance on installing these dependencies.
26
+
27
+ ## Usage: Inference in the Wild
28
+ This section demonstrates how to perform text-steerable 3D editing on user-provided images and text prompts. The flags are similar to benchmark evaluation. You can directly pass an image path via `--image_path` and editing text via `--text` (which can also be a `.txt` file with multiple editing texts, separated by linebreaks). Set `--texture_only` for better geometry consistency during texture-only edits. A visualization PNG will be generated in the output directory. If `--export_glb` is set, GLB files of the 3D objects will also be generated.
29
+
30
+ **Important**: First, set `PYTHONPATH=[path to Steer3D]` to include the project directory in your Python path. Model checkpoints must be downloaded from the [Hugging Face repository](https://huggingface.co/ziqima/Steer3D/tree/main), and `[path-to-checkpoints]` should be replaced with their actual location.
31
+
32
+ Here are 3 example edits demonstrating removal, addition, and texture changes for a traffic cone based on a natural photo.
33
+
34
+ ### Removal Example
35
+ ```bash
36
+ python evaluation/eval_wild.py \
37
+ --image_path media/cone.jpg \
38
+ --text "Remove the entire bottom base" \
39
+ --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_remove.pt \
40
+ --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
41
+ --stage1_config configs/stage1_controlnet.json \
42
+ --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
43
+ --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
44
+ --stage2_config configs/stage2_controlnet.json \
45
+ --output_dir visualizations/single_image \
46
+ --num_seeds 1
47
+ ```
48
+
49
+ ### Texture Example
50
+ ```bash
51
+ python evaluation/eval_wild.py \
52
+ --image_path media/cone.jpg \
53
+ --text "Turn the entire cone into a metallic silver texture" \
54
+ --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
55
+ --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
56
+ --stage1_config configs/stage1_controlnet.json \
57
+ --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
58
+ --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
59
+ --stage2_config configs/stage2_controlnet.json \
60
+ --output_dir visualizations/single_image \
61
+ --texture_only \
62
+ --num_seeds 1
63
+ ```
64
+
65
+ ### Addition Example
66
+ ```bash
67
+ python evaluation/eval_wild.py \
68
+ --image_path media/cone.jpg \
69
+ --text "Add a cap shaped light on top of the cone" \
70
+ --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
71
+ --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
72
+ --stage1_config configs/stage1_controlnet.json \
73
+ --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
74
+ --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
75
+ --stage2_config configs/stage2_controlnet.json \
76
+ --output_dir visualizations/single_image \
77
+ --num_seeds 1
78
+ ```
79
+
80
+ ## Citation
81
+ If you find our work helpful, please use the following BibTeX entry to cite it:
82
+
83
+ ```BibTex
84
+ @misc{ma2025feedforward3deditingtextsteerable,
85
+ title={Feedforward 3D Editing via Text-Steerable Image-to-3D},
86
+ author={Ziqi Ma and Hongqiao Chen and Yisong Yue and Georgia Gkioxari},
87
+ year={2025},
88
+ eprint={2512.13678},
89
+ archivePrefix={arXiv},
90
+ primaryClass={cs.CV},
91
+ url={https://arxiv.org/abs/2512.13678},
92
+ }
93
+ ```