Enhance model card with metadata, links, overview, and usage examples
Browse filesThis PR significantly improves the model card for **Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D** by:
- **Adding relevant metadata**:
- Setting `pipeline_tag: image-to-3d` for better discoverability on the Hugging Face Hub.
- Retaining the existing `license: mit`.
- **Enriching the content**:
- Adding direct links to the paper ([Feedforward 3D Editing via Text-Steerable Image-to-3D](https://huggingface.co/papers/2512.13678)), project page ([https://glab-caltech.github.io/steer3d/](https://glab-caltech.github.io/steer3d/)), and GitHub repository ([https://github.com/ziqi-ma/Steer3D](https://github.com/ziqi-ma/Steer3D)).
- Including a concise overview of the model's capabilities and architecture.
- Incorporating the teaser image from the project to visually represent the model.
- Providing detailed environment setup instructions and comprehensive sample usage commands for "Inference in the Wild" directly from the GitHub README, demonstrating removal, texture, and addition edits.
- Adding the official BibTeX citation.
These updates aim to provide users with a more informative and actionable model card.
|
@@ -1,4 +1,93 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
| 3 |
---
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: image-to-3d
|
| 4 |
---
|
| 5 |
+
|
| 6 |
+
# Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D
|
| 7 |
+
|
| 8 |
+
[Paper](https://huggingface.co/papers/2512.13678) | [Project Page](https://glab-caltech.github.io/steer3d/) | [Code](https://github.com/ziqi-ma/Steer3D)
|
| 9 |
+
|
| 10 |
+
Steer3D is a novel feedforward method that introduces text steerability to image-to-3D models, enabling the editing of generated 3D assets using natural language instructions. Inspired by ControlNet, this approach adapts the architecture for image-to-3D generation, facilitating direct text steering in a forward pass. Steer3D demonstrates faithful adherence to language instructions and maintains better consistency with the original 3D asset, while being significantly faster than competing methods.
|
| 11 |
+
|
| 12 |
+

|
| 13 |
+
|
| 14 |
+
## Overview
|
| 15 |
+
Steer3D adapts the ControlNet architecture to add text steerability to image-to-3D models. It is trained on a 100k-scale synthetic dataset generated by a custom data engine. This project shares code for both the data engine and the model, with scripts for various steps explained in the `dataengine/README.md` within the GitHub repository.
|
| 16 |
+
|
| 17 |
+
## Environment Setup
|
| 18 |
+
To set up the environment for the model, please follow the instructions below. Note that the data engine requires a separate environment setup, detailed in `dataengine/README.md`.
|
| 19 |
+
|
| 20 |
+
```bash
|
| 21 |
+
conda env create -f environment.yml
|
| 22 |
+
conda activate steer3d
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
Libraries such as `kaolin`, `nvdiffrast`, `diffoctreerast`, `mip-splatting`, and `vox2seq` may require manual installation. Refer to the [setup script from TRELLIS](https://github.com/microsoft/TRELLIS/blob/main/setup.sh) for guidance on installing these dependencies.
|
| 26 |
+
|
| 27 |
+
## Usage: Inference in the Wild
|
| 28 |
+
This section demonstrates how to perform text-steerable 3D editing on user-provided images and text prompts. The flags are similar to benchmark evaluation. You can directly pass an image path via `--image_path` and editing text via `--text` (which can also be a `.txt` file with multiple editing texts, separated by linebreaks). Set `--texture_only` for better geometry consistency during texture-only edits. A visualization PNG will be generated in the output directory. If `--export_glb` is set, GLB files of the 3D objects will also be generated.
|
| 29 |
+
|
| 30 |
+
**Important**: First, set `PYTHONPATH=[path to Steer3D]` to include the project directory in your Python path. Model checkpoints must be downloaded from the [Hugging Face repository](https://huggingface.co/ziqima/Steer3D/tree/main), and `[path-to-checkpoints]` should be replaced with their actual location.
|
| 31 |
+
|
| 32 |
+
Here are 3 example edits demonstrating removal, addition, and texture changes for a traffic cone based on a natural photo.
|
| 33 |
+
|
| 34 |
+
### Removal Example
|
| 35 |
+
```bash
|
| 36 |
+
python evaluation/eval_wild.py \
|
| 37 |
+
--image_path media/cone.jpg \
|
| 38 |
+
--text "Remove the entire bottom base" \
|
| 39 |
+
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_remove.pt \
|
| 40 |
+
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
|
| 41 |
+
--stage1_config configs/stage1_controlnet.json \
|
| 42 |
+
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
|
| 43 |
+
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
|
| 44 |
+
--stage2_config configs/stage2_controlnet.json \
|
| 45 |
+
--output_dir visualizations/single_image \
|
| 46 |
+
--num_seeds 1
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
### Texture Example
|
| 50 |
+
```bash
|
| 51 |
+
python evaluation/eval_wild.py \
|
| 52 |
+
--image_path media/cone.jpg \
|
| 53 |
+
--text "Turn the entire cone into a metallic silver texture" \
|
| 54 |
+
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
|
| 55 |
+
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
|
| 56 |
+
--stage1_config configs/stage1_controlnet.json \
|
| 57 |
+
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
|
| 58 |
+
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
|
| 59 |
+
--stage2_config configs/stage2_controlnet.json \
|
| 60 |
+
--output_dir visualizations/single_image \
|
| 61 |
+
--texture_only \
|
| 62 |
+
--num_seeds 1
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
### Addition Example
|
| 66 |
+
```bash
|
| 67 |
+
python evaluation/eval_wild.py \
|
| 68 |
+
--image_path media/cone.jpg \
|
| 69 |
+
--text "Add a cap shaped light on top of the cone" \
|
| 70 |
+
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
|
| 71 |
+
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
|
| 72 |
+
--stage1_config configs/stage1_controlnet.json \
|
| 73 |
+
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
|
| 74 |
+
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
|
| 75 |
+
--stage2_config configs/stage2_controlnet.json \
|
| 76 |
+
--output_dir visualizations/single_image \
|
| 77 |
+
--num_seeds 1
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## Citation
|
| 81 |
+
If you find our work helpful, please use the following BibTeX entry to cite it:
|
| 82 |
+
|
| 83 |
+
```BibTex
|
| 84 |
+
@misc{ma2025feedforward3deditingtextsteerable,
|
| 85 |
+
title={Feedforward 3D Editing via Text-Steerable Image-to-3D},
|
| 86 |
+
author={Ziqi Ma and Hongqiao Chen and Yisong Yue and Georgia Gkioxari},
|
| 87 |
+
year={2025},
|
| 88 |
+
eprint={2512.13678},
|
| 89 |
+
archivePrefix={arXiv},
|
| 90 |
+
primaryClass={cs.CV},
|
| 91 |
+
url={https://arxiv.org/abs/2512.13678},
|
| 92 |
+
}
|
| 93 |
+
```
|