Spaces:

farhanawan99
/

3d-room-layout-estimator

Running

App Files Files Community

3d-room-layout-estimator / README.md

adnan

Initial Hugging Face deployment - clean history (no binary test files)

2b5423c 4 months ago

preview code

raw

history blame contribute delete

3.81 kB

metadata

title: 3D Room Layout Estimation
emoji: 🏠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
license: mit

3D Room Layout Estimation (RGB-only)

Phase 1 — Core Perception & Geometry Pipeline

This repository contains my ongoing work on RGB-only indoor scene understanding, starting with monocular depth estimation and room layout perception.

The current focus (Phase 1) is to establish a clean, modular pipeline that takes RGB input (image or video) and produces stable geometric signals that can later support 3D object reasoning and navigation.

This is active development, not a final or polished system.

What Phase 1 Covers

Phase 1 is intentionally scoped and foundational.

It includes:

Monocular depth estimation from RGB
Room layout estimation (walls, floor, ceiling)
Basic geometric reasoning from depth + layout
Temporal handling for video input
Visualization utilities for debugging and inspection

It does not yet attempt full 3D reconstruction, SLAM, or navigation.

Repository Structure (Current)

This reflects the actual structure in the reiterating-phase1-modules branch.

.
├── config/
│   └── phase1_config.py        # Central configuration for Phase 1
│
├── core/
│   └── phase1_pipeline.py      # Orchestrates depth + layout + fusion
│
├── models/
│   └── depth/
│       └── depth_estimator.py  # Monocular depth wrapper
│
├── trainer/                    # Existing layout model code (legacy / reused)
│
├── utils/
│   ├── depth.py                # Depth utilities
│   ├── fusion.py               # Depth + layout fusion logic
│   ├── geometry.py             # Basic geometric helpers
│   ├── temporal.py             # Temporal smoothing / state
│   └── visualize.py            # Debug visualizations
│
├── scripts/                    # Small helper scripts (non-core)
├── notebooks/                  # Experiments and exploration
├── tests/                      # Minimal testing utilities
│
├── run_phase1_demo.py          # Runs Phase 1 on video input
├── test_single_image.py        # Quick single-image sanity test
├── main.py                     # Entry point (experimental)
│
├── requirements.txt
├── environment.yml
└── README.md

How to Run (Phase 1)

Environment

conda env create -f environment.yml
conda activate rgb_perception

pip install -r requirements.txt

Video Demo

python run_phase1_demo.py \
    --video path/to/video.mp4 \
    --output output_name

Single Image Test

python test_single_image.py \
    --image path/to/image.jpg

What the Pipeline Produces

At this stage, outputs are primarily for inspection and debugging, not final consumption.

Typical outputs include:

Depth maps (monocular, approximate)
Layout segmentation maps
Overlaid visualizations
Intermediate geometric representations

Exact formats may change as the pipeline stabilizes.

Design Notes

RGB-only by design — no depth sensors.
Geometry is approximate, not metrically perfect.
Temporal logic is lightweight and experimental.
Structured to evolve into object reasoning and navigation.

What This Branch Is Not

Not a finished product
Not benchmarked
Not client-ready
Not a navigation system

Roadmap (High Level)

Phase 2: RGB-only 3D object detection
Phase 3: Scene-level spatial reasoning
Phase 4: Waypoint generation
Phase 5: Agent-level interfaces

Status

Phase 1 is under active iteration. Expect refactors and breaking changes.