--- title: 3D Room Layout Estimation emoji: 🏠 colorFrom: blue colorTo: green sdk: docker app_port: 8000 pinned: false license: mit --- # 3D Room Layout Estimation (RGB-only) ### Phase 1 — Core Perception & Geometry Pipeline This repository contains my ongoing work on **RGB-only indoor scene understanding**, starting with monocular depth estimation and room layout perception. The current focus (Phase 1) is to establish a **clean, modular pipeline** that takes RGB input (image or video) and produces stable geometric signals that can later support 3D object reasoning and navigation. This is **active development**, not a final or polished system. --- ## What Phase 1 Covers Phase 1 is intentionally scoped and foundational. It includes: - **Monocular depth estimation** from RGB - **Room layout estimation** (walls, floor, ceiling) - **Basic geometric reasoning** from depth + layout - **Temporal handling** for video input - **Visualization utilities** for debugging and inspection It does **not** yet attempt full 3D reconstruction, SLAM, or navigation. --- ## Repository Structure (Current) This reflects the actual structure in the `reiterating-phase1-modules` branch. ``` . ├── config/ │ └── phase1_config.py # Central configuration for Phase 1 │ ├── core/ │ └── phase1_pipeline.py # Orchestrates depth + layout + fusion │ ├── models/ │ └── depth/ │ └── depth_estimator.py # Monocular depth wrapper │ ├── trainer/ # Existing layout model code (legacy / reused) │ ├── utils/ │ ├── depth.py # Depth utilities │ ├── fusion.py # Depth + layout fusion logic │ ├── geometry.py # Basic geometric helpers │ ├── temporal.py # Temporal smoothing / state │ └── visualize.py # Debug visualizations │ ├── scripts/ # Small helper scripts (non-core) ├── notebooks/ # Experiments and exploration ├── tests/ # Minimal testing utilities │ ├── run_phase1_demo.py # Runs Phase 1 on video input ├── test_single_image.py # Quick single-image sanity test ├── main.py # Entry point (experimental) │ ├── requirements.txt ├── environment.yml └── README.md ``` --- ## How to Run (Phase 1) ### Environment ```bash conda env create -f environment.yml conda activate rgb_perception ``` or ```bash pip install -r requirements.txt ``` --- ### Video Demo ```bash python run_phase1_demo.py \ --video path/to/video.mp4 \ --output output_name ``` --- ### Single Image Test ```bash python test_single_image.py \ --image path/to/image.jpg ``` --- ## What the Pipeline Produces At this stage, outputs are **primarily for inspection and debugging**, not final consumption. Typical outputs include: - Depth maps (monocular, approximate) - Layout segmentation maps - Overlaid visualizations - Intermediate geometric representations Exact formats may change as the pipeline stabilizes. --- ## Design Notes - RGB-only by design — no depth sensors. - Geometry is approximate, not metrically perfect. - Temporal logic is lightweight and experimental. - Structured to evolve into object reasoning and navigation. --- ## What This Branch Is Not - Not a finished product - Not benchmarked - Not client-ready - Not a navigation system --- ## Roadmap (High Level) - Phase 2: RGB-only 3D object detection - Phase 3: Scene-level spatial reasoning - Phase 4: Waypoint generation - Phase 5: Agent-level interfaces --- ## Status Phase 1 is under active iteration. Expect refactors and breaking changes.