adnan
Initial Hugging Face deployment - clean history (no binary test files)
2b5423c
metadata
title: 3D Room Layout Estimation
emoji: 🏠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
license: mit

3D Room Layout Estimation (RGB-only)

Phase 1 β€” Core Perception & Geometry Pipeline

This repository contains my ongoing work on RGB-only indoor scene understanding, starting with monocular depth estimation and room layout perception.

The current focus (Phase 1) is to establish a clean, modular pipeline that takes RGB input (image or video) and produces stable geometric signals that can later support 3D object reasoning and navigation.

This is active development, not a final or polished system.


What Phase 1 Covers

Phase 1 is intentionally scoped and foundational.

It includes:

  • Monocular depth estimation from RGB
  • Room layout estimation (walls, floor, ceiling)
  • Basic geometric reasoning from depth + layout
  • Temporal handling for video input
  • Visualization utilities for debugging and inspection

It does not yet attempt full 3D reconstruction, SLAM, or navigation.


Repository Structure (Current)

This reflects the actual structure in the reiterating-phase1-modules branch.

.
β”œβ”€β”€ config/
β”‚   └── phase1_config.py        # Central configuration for Phase 1
β”‚
β”œβ”€β”€ core/
β”‚   └── phase1_pipeline.py      # Orchestrates depth + layout + fusion
β”‚
β”œβ”€β”€ models/
β”‚   └── depth/
β”‚       └── depth_estimator.py  # Monocular depth wrapper
β”‚
β”œβ”€β”€ trainer/                    # Existing layout model code (legacy / reused)
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ depth.py                # Depth utilities
β”‚   β”œβ”€β”€ fusion.py               # Depth + layout fusion logic
β”‚   β”œβ”€β”€ geometry.py             # Basic geometric helpers
β”‚   β”œβ”€β”€ temporal.py             # Temporal smoothing / state
β”‚   └── visualize.py            # Debug visualizations
β”‚
β”œβ”€β”€ scripts/                    # Small helper scripts (non-core)
β”œβ”€β”€ notebooks/                  # Experiments and exploration
β”œβ”€β”€ tests/                      # Minimal testing utilities
β”‚
β”œβ”€β”€ run_phase1_demo.py          # Runs Phase 1 on video input
β”œβ”€β”€ test_single_image.py        # Quick single-image sanity test
β”œβ”€β”€ main.py                     # Entry point (experimental)
β”‚
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ environment.yml
└── README.md

How to Run (Phase 1)

Environment

conda env create -f environment.yml
conda activate rgb_perception

or

pip install -r requirements.txt

Video Demo

python run_phase1_demo.py \
    --video path/to/video.mp4 \
    --output output_name

Single Image Test

python test_single_image.py \
    --image path/to/image.jpg

What the Pipeline Produces

At this stage, outputs are primarily for inspection and debugging, not final consumption.

Typical outputs include:

  • Depth maps (monocular, approximate)
  • Layout segmentation maps
  • Overlaid visualizations
  • Intermediate geometric representations

Exact formats may change as the pipeline stabilizes.


Design Notes

  • RGB-only by design β€” no depth sensors.
  • Geometry is approximate, not metrically perfect.
  • Temporal logic is lightweight and experimental.
  • Structured to evolve into object reasoning and navigation.

What This Branch Is Not

  • Not a finished product
  • Not benchmarked
  • Not client-ready
  • Not a navigation system

Roadmap (High Level)

  • Phase 2: RGB-only 3D object detection
  • Phase 3: Scene-level spatial reasoning
  • Phase 4: Waypoint generation
  • Phase 5: Agent-level interfaces

Status

Phase 1 is under active iteration. Expect refactors and breaking changes.