🌀️ NUS-40: Dense Campus Weather Embeddings

40 weather stations Β· 1 year hourly Β· 6 variables Β· NUS Singapore
A single 6-dimensional VAE embedding supports spatial interpolation, forecasting, clustering, anomaly detection, and future prediction.


πŸ”¬ Overview

A variational autoencoder (VAE) that compresses 6-variable weather observations from 40 campus stations into a compact 6-dimensional embedding. The embedding achieves RΒ² > 0.99 reconstruction and supports 5 downstream tasks with no retraining:

# Task Result
1️⃣ Spatial Interpolation β€” predict weather at unmeasured locations AirTemp MAE = 0.39Β°C
2️⃣ Temporal Forecasting β€” predict future weather vs persistence baseline +15.7% skill at T+6h
3️⃣ Microclimate Clustering β€” discover climate zones without labels 4 zones (silhouette=0.23)
4️⃣ Anomaly Detection β€” flag unusual weather from reconstruction error 5% flagged, storm-linked
5️⃣ 24h Future Prediction β€” rolling forecast across full diurnal cycle +42% peak skill at T+14h

πŸ“Š The NUS-40 Dataset

40 stations deployed across the National University of Singapore Kent Ridge campus (~2 kmΒ²), recording at hourly resolution for all of 2025.

Variables

Variable Unit Mean Std Range
🌑️ Air Temperature Β°C 28.64 2.56 21.5 – 39.5
πŸ’§ Relative Humidity % 80.49 10.11 37.6 – 99.5
πŸ”΅ Atmospheric Pressure hPa 1006.4 2.72 994.6 – 1016.4
πŸ’¨ Wind Speed m/s 0.65 0.67 0.0 – 17.0
🧭 Wind Direction Β° 185.4 108.0 0 – 360
β˜€οΈ Solar Radiation W/mΒ² 141.1 233.2 0 – 1500

At a Glance

  • 40 stations, mean spacing ~100–200 m
  • 8,760 hours (Jan–Dec 2025)
  • 2.0 km Γ— 1.4 km campus footprint
  • Tropical climate (KΓΆppen Af) β€” minimal seasons, strong diurnal cycle
  • 4.7% missing, with imputation flags provided
  • WS17 has no pressure sensor (filled with campus mean)

πŸ—οΈ Model

A standard VAE with MLP encoder and decoder.

Input (6 vars) β†’ Encoder (3-layer MLP, 128 hidden) β†’ z ~ N(ΞΌ, σ²)  [6 dims]
                                                          ↓
Output (6 vars) ← Decoder (3-layer MLP, 128 hidden) β†β”€β”€β”€β”€β”˜
Property Value
Parameters 70,930
Latent dimensions 6
Encoder/Decoder 3-layer MLP, LayerNorm, GELU
Loss MSE + Ξ²Β·KL (Ξ² = 0.001)
Training 100 epochs, AdamW, cosine schedule
Training time ~20 min on CPU

πŸ“ˆ Results

Reconstruction (RΒ² on held-out test set)

AirTemp RelHum AtmPress WindSpeed WindDir GlobalRad
0.9997 0.9997 0.9995 0.9429 0.9994 0.9998

Spatial Interpolation (5 held-out stations, reconstructed from neighbours)

Variable MAE RΒ²
🌑️ Air Temperature 0.39°C 0.949
πŸ’§ Relative Humidity 1.80% 0.944
πŸ”΅ Atmospheric Pressure 0.21 hPa 0.987
πŸ’¨ Wind Speed 0.33 m/s βˆ’0.52
β˜€οΈ Solar Radiation 35.5 W/mΒ² 0.19

Temperature and humidity interpolate within sensor accuracy (Β±0.3Β°C). Wind and radiation depend too strongly on local building geometry.

Forecasting Skill (vs persistence baseline)

Horizon AirTemp RelHum
T+1h βˆ’6.0% ❌ βˆ’8.0% ❌
T+6h +15.7% βœ… +13.8% βœ…
T+12h +37.9% βœ… ~+25% βœ…
T+24h +2.5% +5.0%

Persistence wins at 1h in the tropics. Embeddings outperform at 6–15h horizons.

Anomaly Detection

  • 438 hours flagged (5.0% of year)
  • Anomalous hours have 54% less solar radiation β†’ storm/cloud association
  • Bimodal temporal pattern: peaks at 07:00 (sunrise) and 18:00 (sunset) transitions
  • Station WS17 flagged automatically (missing pressure sensor)

πŸ“ Repository Structure

πŸ“¦ citysyntaxlab/campus-weather
β”œβ”€β”€ πŸ“„ README.md
β”œβ”€β”€ πŸ“„ paper/paper.md               ← Full manuscript (~4,300 words, 16 references)
β”‚
β”œβ”€β”€ πŸ’» code/
β”‚   β”œβ”€β”€ model.py                    ← VAE architecture (90 lines)
β”‚   β”œβ”€β”€ train.py                    ← Data loading, training, embedding extraction
β”‚   β”œβ”€β”€ evaluate.py                 ← All 5 downstream evaluations
β”‚   └── figures.py                  ← Figure generation
β”‚
β”œβ”€β”€ πŸ“Š figures/                     ← 6 figures (PDF + PNG)
β”‚   β”œβ”€β”€ fig1_campus.{pdf,png}       ← Station map + discovered clusters
β”‚   β”œβ”€β”€ fig2_reconstruction.{pdf,png} ← Reconstruction RΒ² bar chart
β”‚   β”œβ”€β”€ fig3_spatial.{pdf,png}      ← Spatial interpolation results
β”‚   β”œβ”€β”€ fig4_forecasting.{pdf,png}  ← Forecast MAE comparison
β”‚   β”œβ”€β”€ fig5_anomaly.{pdf,png}      ← Anomaly timeseries + hour distribution
β”‚   └── fig6_future.{pdf,png}       ← 24h forecast skill curves
β”‚
β”œβ”€β”€ πŸ§ͺ results/
β”‚   β”œβ”€β”€ all_results.json            ← All numerical results
β”‚   β”œβ”€β”€ anomaly_errors.npy          ← Hourly reconstruction errors
β”‚   └── checkpoints/
β”‚       β”œβ”€β”€ best.pt                 ← Trained model weights
β”‚       └── embeddings.npz          ← All embeddings: (8760, 40, 6) + data + coords
β”‚
β”œβ”€β”€ πŸ“‘ raw/                         ← 40 station CSVs (original measurements)
β”‚   └── NUS_CAMPUS_WS{01-40}_2025_Hourly.csv
β”‚
└── πŸ“‘ imputed/                     ← 40 station CSVs (gap-filled, with flags)
    └── NUS_CAMPUS_WS{01-40}_2025_Hourly_imputed.csv

πŸš€ Quick Start

Load pre-trained model and embeddings

import torch, numpy as np
from model import WeatherVAE

# Load checkpoint
ckpt = torch.load('results/checkpoints/best.pt', map_location='cpu')
model = WeatherVAE(**ckpt['config'])
model.load_state_dict(ckpt['model'])
model.set_normalisation(ckpt['mean'], ckpt['std'])
model.eval()

# Get embedding for a weather observation
# [WindSpeed, WindDir, AirTemp, RelHum, AtmPress, GlobalRad]
x = torch.tensor([[0.5, 180.0, 29.0, 80.0, 1007.0, 300.0]])
z = model.get_embedding(x)  # shape: (1, 6)

# Load pre-computed embeddings for all data
npz = np.load('results/checkpoints/embeddings.npz', allow_pickle=True)
embeddings = npz['embeddings']   # (8760, 40, 6)
data = npz['data']               # (8760, 40, 6)
coords = npz['coords']           # (40, 2) β€” [lat, lng]

Train from scratch

python code/train.py --data imputed/ --epochs 100 --config base

Run all evaluations

python code/evaluate.py

Generate figures

python code/figures.py

πŸ“ Paper

Full manuscript at paper/paper.md.

Title: Learning Dense Weather Embeddings for Campus-Scale Microclimate Analysis
Target venue: Building and Environment
Words: ~4,300 | References: 16 (all verified)


πŸ“Ž Citation

@article{nus40weather2025,
  title={Learning Dense Weather Embeddings for Campus-Scale Microclimate Analysis},
  author={City Syntax Lab, National University of Singapore},
  year={2025}
}

πŸ“œ License

Dataset and code released under CC-BY-4.0. Please cite the paper if you use this work.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support