π€οΈ NUS-40: Dense Campus Weather Embeddings
40 weather stations Β· 1 year hourly Β· 6 variables Β· NUS Singapore
A single 6-dimensional VAE embedding supports spatial interpolation, forecasting, clustering, anomaly detection, and future prediction.
π¬ Overview
A variational autoencoder (VAE) that compresses 6-variable weather observations from 40 campus stations into a compact 6-dimensional embedding. The embedding achieves RΒ² > 0.99 reconstruction and supports 5 downstream tasks with no retraining:
| # | Task | Result |
|---|---|---|
| 1οΈβ£ | Spatial Interpolation β predict weather at unmeasured locations | AirTemp MAE = 0.39Β°C |
| 2οΈβ£ | Temporal Forecasting β predict future weather vs persistence baseline | +15.7% skill at T+6h |
| 3οΈβ£ | Microclimate Clustering β discover climate zones without labels | 4 zones (silhouette=0.23) |
| 4οΈβ£ | Anomaly Detection β flag unusual weather from reconstruction error | 5% flagged, storm-linked |
| 5οΈβ£ | 24h Future Prediction β rolling forecast across full diurnal cycle | +42% peak skill at T+14h |
π The NUS-40 Dataset
40 stations deployed across the National University of Singapore Kent Ridge campus (~2 kmΒ²), recording at hourly resolution for all of 2025.
Variables
| Variable | Unit | Mean | Std | Range |
|---|---|---|---|---|
| π‘οΈ Air Temperature | Β°C | 28.64 | 2.56 | 21.5 β 39.5 |
| π§ Relative Humidity | % | 80.49 | 10.11 | 37.6 β 99.5 |
| π΅ Atmospheric Pressure | hPa | 1006.4 | 2.72 | 994.6 β 1016.4 |
| π¨ Wind Speed | m/s | 0.65 | 0.67 | 0.0 β 17.0 |
| π§ Wind Direction | Β° | 185.4 | 108.0 | 0 β 360 |
| βοΈ Solar Radiation | W/mΒ² | 141.1 | 233.2 | 0 β 1500 |
At a Glance
- 40 stations, mean spacing ~100β200 m
- 8,760 hours (JanβDec 2025)
- 2.0 km Γ 1.4 km campus footprint
- Tropical climate (KΓΆppen Af) β minimal seasons, strong diurnal cycle
- 4.7% missing, with imputation flags provided
- WS17 has no pressure sensor (filled with campus mean)
ποΈ Model
A standard VAE with MLP encoder and decoder.
Input (6 vars) β Encoder (3-layer MLP, 128 hidden) β z ~ N(ΞΌ, ΟΒ²) [6 dims]
β
Output (6 vars) β Decoder (3-layer MLP, 128 hidden) ββββββ
| Property | Value |
|---|---|
| Parameters | 70,930 |
| Latent dimensions | 6 |
| Encoder/Decoder | 3-layer MLP, LayerNorm, GELU |
| Loss | MSE + Ξ²Β·KL (Ξ² = 0.001) |
| Training | 100 epochs, AdamW, cosine schedule |
| Training time | ~20 min on CPU |
π Results
Reconstruction (RΒ² on held-out test set)
| AirTemp | RelHum | AtmPress | WindSpeed | WindDir | GlobalRad |
|---|---|---|---|---|---|
| 0.9997 | 0.9997 | 0.9995 | 0.9429 | 0.9994 | 0.9998 |
Spatial Interpolation (5 held-out stations, reconstructed from neighbours)
| Variable | MAE | RΒ² |
|---|---|---|
| π‘οΈ Air Temperature | 0.39Β°C | 0.949 |
| π§ Relative Humidity | 1.80% | 0.944 |
| π΅ Atmospheric Pressure | 0.21 hPa | 0.987 |
| π¨ Wind Speed | 0.33 m/s | β0.52 |
| βοΈ Solar Radiation | 35.5 W/mΒ² | 0.19 |
Temperature and humidity interpolate within sensor accuracy (Β±0.3Β°C). Wind and radiation depend too strongly on local building geometry.
Forecasting Skill (vs persistence baseline)
| Horizon | AirTemp | RelHum |
|---|---|---|
| T+1h | β6.0% β | β8.0% β |
| T+6h | +15.7% β | +13.8% β |
| T+12h | +37.9% β | ~+25% β |
| T+24h | +2.5% | +5.0% |
Persistence wins at 1h in the tropics. Embeddings outperform at 6β15h horizons.
Anomaly Detection
- 438 hours flagged (5.0% of year)
- Anomalous hours have 54% less solar radiation β storm/cloud association
- Bimodal temporal pattern: peaks at 07:00 (sunrise) and 18:00 (sunset) transitions
- Station WS17 flagged automatically (missing pressure sensor)
π Repository Structure
π¦ citysyntaxlab/campus-weather
βββ π README.md
βββ π paper/paper.md β Full manuscript (~4,300 words, 16 references)
β
βββ π» code/
β βββ model.py β VAE architecture (90 lines)
β βββ train.py β Data loading, training, embedding extraction
β βββ evaluate.py β All 5 downstream evaluations
β βββ figures.py β Figure generation
β
βββ π figures/ β 6 figures (PDF + PNG)
β βββ fig1_campus.{pdf,png} β Station map + discovered clusters
β βββ fig2_reconstruction.{pdf,png} β Reconstruction RΒ² bar chart
β βββ fig3_spatial.{pdf,png} β Spatial interpolation results
β βββ fig4_forecasting.{pdf,png} β Forecast MAE comparison
β βββ fig5_anomaly.{pdf,png} β Anomaly timeseries + hour distribution
β βββ fig6_future.{pdf,png} β 24h forecast skill curves
β
βββ π§ͺ results/
β βββ all_results.json β All numerical results
β βββ anomaly_errors.npy β Hourly reconstruction errors
β βββ checkpoints/
β βββ best.pt β Trained model weights
β βββ embeddings.npz β All embeddings: (8760, 40, 6) + data + coords
β
βββ π‘ raw/ β 40 station CSVs (original measurements)
β βββ NUS_CAMPUS_WS{01-40}_2025_Hourly.csv
β
βββ π‘ imputed/ β 40 station CSVs (gap-filled, with flags)
βββ NUS_CAMPUS_WS{01-40}_2025_Hourly_imputed.csv
π Quick Start
Load pre-trained model and embeddings
import torch, numpy as np
from model import WeatherVAE
# Load checkpoint
ckpt = torch.load('results/checkpoints/best.pt', map_location='cpu')
model = WeatherVAE(**ckpt['config'])
model.load_state_dict(ckpt['model'])
model.set_normalisation(ckpt['mean'], ckpt['std'])
model.eval()
# Get embedding for a weather observation
# [WindSpeed, WindDir, AirTemp, RelHum, AtmPress, GlobalRad]
x = torch.tensor([[0.5, 180.0, 29.0, 80.0, 1007.0, 300.0]])
z = model.get_embedding(x) # shape: (1, 6)
# Load pre-computed embeddings for all data
npz = np.load('results/checkpoints/embeddings.npz', allow_pickle=True)
embeddings = npz['embeddings'] # (8760, 40, 6)
data = npz['data'] # (8760, 40, 6)
coords = npz['coords'] # (40, 2) β [lat, lng]
Train from scratch
python code/train.py --data imputed/ --epochs 100 --config base
Run all evaluations
python code/evaluate.py
Generate figures
python code/figures.py
π Paper
Full manuscript at paper/paper.md.
Title: Learning Dense Weather Embeddings for Campus-Scale Microclimate Analysis
Target venue: Building and Environment
Words: ~4,300 | References: 16 (all verified)
π Citation
@article{nus40weather2025,
title={Learning Dense Weather Embeddings for Campus-Scale Microclimate Analysis},
author={City Syntax Lab, National University of Singapore},
year={2025}
}
π License
Dataset and code released under CC-BY-4.0. Please cite the paper if you use this work.