Enhance model card with metadata, links, overview, and usage examples

This PR significantly improves the model card for **Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D** by:

- **Adding relevant metadata**:
- Setting `pipeline_tag: image-to-3d` for better discoverability on the Hugging Face Hub.
- Retaining the existing `license: mit`.
- **Enriching the content**:
- Adding direct links to the paper ([Feedforward 3D Editing via Text-Steerable Image-to-3D](https://huggingface.co/papers/2512.13678)), project page ([https://glab-caltech.github.io/steer3d/](https://glab-caltech.github.io/steer3d/)), and GitHub repository ([https://github.com/ziqi-ma/Steer3D](https://github.com/ziqi-ma/Steer3D)).
- Including a concise overview of the model's capabilities and architecture.
- Incorporating the teaser image from the project to visually represent the model.
- Providing detailed environment setup instructions and comprehensive sample usage commands for "Inference in the Wild" directly from the GitHub README, demonstrating removal, texture, and addition edits.
- Adding the official BibTeX citation.

These updates aim to provide users with a more informative and actionable model card.

Files changed (1) hide show

README.md +90 -1

README.md CHANGED Viewed

@@ -1,4 +1,93 @@
 ---
 license: mit
 ---
-Checkpoints of "Feedforward 3D Editing via Text-Steerable Image-to-3D" (https://glab-caltech.github.io/steer3d/).

 ---
 license: mit
+pipeline_tag: image-to-3d
 ---
+# Steer3D: Feedforward 3D Editing via Text-Steerable Image-to-3D
+[Paper](https://huggingface.co/papers/2512.13678) | [Project Page](https://glab-caltech.github.io/steer3d/) | [Code](https://github.com/ziqi-ma/Steer3D)
+Steer3D is a novel feedforward method that introduces text steerability to image-to-3D models, enabling the editing of generated 3D assets using natural language instructions. Inspired by ControlNet, this approach adapts the architecture for image-to-3D generation, facilitating direct text steering in a forward pass. Steer3D demonstrates faithful adherence to language instructions and maintains better consistency with the original 3D asset, while being significantly faster than competing methods.
+![teaser](https://github.com/ziqi-ma/Steer3D/raw/main/media/teaser.png?raw=true)
+## Overview
+Steer3D adapts the ControlNet architecture to add text steerability to image-to-3D models. It is trained on a 100k-scale synthetic dataset generated by a custom data engine. This project shares code for both the data engine and the model, with scripts for various steps explained in the `dataengine/README.md` within the GitHub repository.
+## Environment Setup
+To set up the environment for the model, please follow the instructions below. Note that the data engine requires a separate environment setup, detailed in `dataengine/README.md`.
+```bash
+conda env create -f environment.yml
+conda activate steer3d
+```
+Libraries such as `kaolin`, `nvdiffrast`, `diffoctreerast`, `mip-splatting`, and `vox2seq` may require manual installation. Refer to the [setup script from TRELLIS](https://github.com/microsoft/TRELLIS/blob/main/setup.sh) for guidance on installing these dependencies.
+## Usage: Inference in the Wild
+This section demonstrates how to perform text-steerable 3D editing on user-provided images and text prompts. The flags are similar to benchmark evaluation. You can directly pass an image path via `--image_path` and editing text via `--text` (which can also be a `.txt` file with multiple editing texts, separated by linebreaks). Set `--texture_only` for better geometry consistency during texture-only edits. A visualization PNG will be generated in the output directory. If `--export_glb` is set, GLB files of the 3D objects will also be generated.
+**Important**: First, set `PYTHONPATH=[path to Steer3D]` to include the project directory in your Python path. Model checkpoints must be downloaded from the [Hugging Face repository](https://huggingface.co/ziqima/Steer3D/tree/main), and `[path-to-checkpoints]` should be replaced with their actual location.
+Here are 3 example edits demonstrating removal, addition, and texture changes for a traffic cone based on a natural photo.
+### Removal Example
+```bash
+python evaluation/eval_wild.py \
+        --image_path media/cone.jpg \
+        --text "Remove the entire bottom base" \
+        --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_remove.pt \
+        --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
+        --stage1_config configs/stage1_controlnet.json \
+        --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
+        --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
+        --stage2_config configs/stage2_controlnet.json \
+        --output_dir visualizations/single_image \
+        --num_seeds 1
+```
+### Texture Example
+```bash
+python evaluation/eval_wild.py \
+        --image_path media/cone.jpg \
+        --text "Turn the entire cone into a metallic silver texture" \
+        --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
+        --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
+        --stage1_config configs/stage1_controlnet.json \
+        --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
+        --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
+        --stage2_config configs/stage2_controlnet.json \
+        --output_dir visualizations/single_image \
+        --texture_only \
+        --num_seeds 1
+```
+### Addition Example
+```bash
+python evaluation/eval_wild.py \
+        --image_path media/cone.jpg \
+        --text "Add a cap shaped light on top of the cone" \
+        --stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
+        --stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
+        --stage1_config configs/stage1_controlnet.json \
+        --stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
+        --stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
+        --stage2_config configs/stage2_controlnet.json \
+        --output_dir visualizations/single_image \
+        --num_seeds 1
+```
+## Citation
+If you find our work helpful, please use the following BibTeX entry to cite it:
+```BibTex
+@misc{ma2025feedforward3deditingtextsteerable,
+      title={Feedforward 3D Editing via Text-Steerable Image-to-3D},
+      author={Ziqi Ma and Hongqiao Chen and Yisong Yue and Georgia Gkioxari},
+      year={2025},
+      eprint={2512.13678},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2512.13678},
+}
+```