title: 3D Room Layout Estimation
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
license: mit
3D Room Layout Estimation (RGB-only)
Phase 1 β Core Perception & Geometry Pipeline
This repository contains my ongoing work on RGB-only indoor scene understanding, starting with monocular depth estimation and room layout perception.
The current focus (Phase 1) is to establish a clean, modular pipeline that takes RGB input (image or video) and produces stable geometric signals that can later support 3D object reasoning and navigation.
This is active development, not a final or polished system.
What Phase 1 Covers
Phase 1 is intentionally scoped and foundational.
It includes:
- Monocular depth estimation from RGB
- Room layout estimation (walls, floor, ceiling)
- Basic geometric reasoning from depth + layout
- Temporal handling for video input
- Visualization utilities for debugging and inspection
It does not yet attempt full 3D reconstruction, SLAM, or navigation.
Repository Structure (Current)
This reflects the actual structure in the reiterating-phase1-modules branch.
.
βββ config/
β βββ phase1_config.py # Central configuration for Phase 1
β
βββ core/
β βββ phase1_pipeline.py # Orchestrates depth + layout + fusion
β
βββ models/
β βββ depth/
β βββ depth_estimator.py # Monocular depth wrapper
β
βββ trainer/ # Existing layout model code (legacy / reused)
β
βββ utils/
β βββ depth.py # Depth utilities
β βββ fusion.py # Depth + layout fusion logic
β βββ geometry.py # Basic geometric helpers
β βββ temporal.py # Temporal smoothing / state
β βββ visualize.py # Debug visualizations
β
βββ scripts/ # Small helper scripts (non-core)
βββ notebooks/ # Experiments and exploration
βββ tests/ # Minimal testing utilities
β
βββ run_phase1_demo.py # Runs Phase 1 on video input
βββ test_single_image.py # Quick single-image sanity test
βββ main.py # Entry point (experimental)
β
βββ requirements.txt
βββ environment.yml
βββ README.md
How to Run (Phase 1)
Environment
conda env create -f environment.yml
conda activate rgb_perception
or
pip install -r requirements.txt
Video Demo
python run_phase1_demo.py \
--video path/to/video.mp4 \
--output output_name
Single Image Test
python test_single_image.py \
--image path/to/image.jpg
What the Pipeline Produces
At this stage, outputs are primarily for inspection and debugging, not final consumption.
Typical outputs include:
- Depth maps (monocular, approximate)
- Layout segmentation maps
- Overlaid visualizations
- Intermediate geometric representations
Exact formats may change as the pipeline stabilizes.
Design Notes
- RGB-only by design β no depth sensors.
- Geometry is approximate, not metrically perfect.
- Temporal logic is lightweight and experimental.
- Structured to evolve into object reasoning and navigation.
What This Branch Is Not
- Not a finished product
- Not benchmarked
- Not client-ready
- Not a navigation system
Roadmap (High Level)
- Phase 2: RGB-only 3D object detection
- Phase 3: Scene-level spatial reasoning
- Phase 4: Waypoint generation
- Phase 5: Agent-level interfaces
Status
Phase 1 is under active iteration. Expect refactors and breaking changes.