---
title: 3D Room Layout Estimation
emoji: 🏠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
license: mit
---

# 3D Room Layout Estimation (RGB-only)
### Phase 1 — Core Perception & Geometry Pipeline

This repository contains my ongoing work on **RGB-only indoor scene understanding**, starting with monocular depth estimation and room layout perception.

The current focus (Phase 1) is to establish a **clean, modular pipeline** that takes RGB input (image or video) and produces stable geometric signals that can later support 3D object reasoning and navigation.

This is **active development**, not a final or polished system.

---

## What Phase 1 Covers

Phase 1 is intentionally scoped and foundational.

It includes:

- **Monocular depth estimation** from RGB  
- **Room layout estimation** (walls, floor, ceiling)  
- **Basic geometric reasoning** from depth + layout  
- **Temporal handling** for video input  
- **Visualization utilities** for debugging and inspection  

It does **not** yet attempt full 3D reconstruction, SLAM, or navigation.

---

## Repository Structure (Current)

This reflects the actual structure in the `reiterating-phase1-modules` branch.

```
.
├── config/
│   └── phase1_config.py        # Central configuration for Phase 1
│
├── core/
│   └── phase1_pipeline.py      # Orchestrates depth + layout + fusion
│
├── models/
│   └── depth/
│       └── depth_estimator.py  # Monocular depth wrapper
│
├── trainer/                    # Existing layout model code (legacy / reused)
│
├── utils/
│   ├── depth.py                # Depth utilities
│   ├── fusion.py               # Depth + layout fusion logic
│   ├── geometry.py             # Basic geometric helpers
│   ├── temporal.py             # Temporal smoothing / state
│   └── visualize.py            # Debug visualizations
│
├── scripts/                    # Small helper scripts (non-core)
├── notebooks/                  # Experiments and exploration
├── tests/                      # Minimal testing utilities
│
├── run_phase1_demo.py          # Runs Phase 1 on video input
├── test_single_image.py        # Quick single-image sanity test
├── main.py                     # Entry point (experimental)
│
├── requirements.txt
├── environment.yml
└── README.md
```

---

## How to Run (Phase 1)

### Environment

```bash
conda env create -f environment.yml
conda activate rgb_perception
```

or

```bash
pip install -r requirements.txt
```

---

### Video Demo

```bash
python run_phase1_demo.py \
    --video path/to/video.mp4 \
    --output output_name
```

---

### Single Image Test

```bash
python test_single_image.py \
    --image path/to/image.jpg
```

---

## What the Pipeline Produces

At this stage, outputs are **primarily for inspection and debugging**, not final consumption.

Typical outputs include:

- Depth maps (monocular, approximate)
- Layout segmentation maps
- Overlaid visualizations
- Intermediate geometric representations

Exact formats may change as the pipeline stabilizes.

---

## Design Notes

- RGB-only by design — no depth sensors.
- Geometry is approximate, not metrically perfect.
- Temporal logic is lightweight and experimental.
- Structured to evolve into object reasoning and navigation.

---

## What This Branch Is Not

- Not a finished product
- Not benchmarked
- Not client-ready
- Not a navigation system

---

## Roadmap (High Level)

- Phase 2: RGB-only 3D object detection
- Phase 3: Scene-level spatial reasoning
- Phase 4: Waypoint generation
- Phase 5: Agent-level interfaces

---

## Status

Phase 1 is under active iteration.
Expect refactors and breaking changes.