HRM Sudoku Extreme

A Hierarchical Reasoning Model (HRM) trained to solve extreme difficulty Sudoku puzzles using hierarchical processing and adaptive computation.

Model Details

Model Description

This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving extreme difficulty Sudoku puzzles. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract planning and low-level (L) modules for detailed computation. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.

The model processes 9×9 Sudoku grids (81 tokens) and predicts the correct digit for each cell through hierarchical reasoning cycles.

  • Developed by: Sapient Inc.
  • Model type: Hierarchical Reasoning Model (HRM)
  • Language(s): Symbolic reasoning (digits 0-9)
  • License: Apache 2.0
  • Original checkpoint: sapientinc/HRM-checkpoint-sudoku-extreme

Model Sources

Uses

Direct Use

This model is designed for solving extreme difficulty Sudoku puzzles. It can:

  • Solve complex 9×9 Sudoku grids that require advanced reasoning techniques
  • Process partial grids and predict missing digits
  • Demonstrate hierarchical reasoning strategies for constraint satisfaction problems

Downstream Use

The model can be used as:

  • A component in puzzle-solving applications
  • A baseline for research in hierarchical reasoning and adaptive computation
  • An example of applying neural networks to combinatorial optimization problems

Recommendations

Users should be aware that:

  • The model is specialized for Sudoku and should not be used for general reasoning tasks
  • Input must be properly formatted as 9×9 grids with digits 0-9 (0 for empty cells)
  • Inference time may vary due to the adaptive computation mechanism

How to Get Started with the Model

import torch
from transformers import HrmForCausalLM

# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-sudoku-extreme")
model.eval()

# Prepare a Sudoku grid (9x9 = 81 tokens)
# 0 represents empty cells, 1-9 are the digits
sudoku_grid = torch.randint(0, 11, (1, 81))  # Example random grid
puzzle_ids = torch.zeros(1, dtype=torch.long)

# Run inference
with torch.no_grad():
    outputs = model(input_ids=sudoku_grid, puzzle_identifiers=puzzle_ids)

# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted solution: {predictions}")

Training Details

Training Data

The model was trained on a dataset of extreme difficulty Sudoku puzzles. These puzzles require advanced solving techniques beyond basic constraint propagation.

Training Procedure

The model uses a hierarchical architecture with:

  • High-level (H) module: 4 transformer layers for abstract planning
  • Low-level (L) module: 4 transformer layers for detailed computation
  • H-cycles: 2 high-level reasoning cycles
  • L-cycles: 2 low-level computation cycles per H-cycle
  • ACT mechanism: Q-learning based adaptive halting with max 16 steps

Training Hyperparameters

  • Training regime: bfloat16 mixed precision
  • Architecture: 4 H-layers, 4 L-layers, 8 attention heads
  • Hidden size: 512
  • Intermediate size: 1536
  • Max position embeddings: 900
  • Vocabulary size: 11 (digits 0-9 + padding)

Model Architecture

Technical Specifications

Component Value
Total Parameters 27,275,778 (27.3M)
Model Size 109.11 MB
Vocabulary Size 11
Hidden Size 512
Intermediate Size 1536
H-level Layers 4
L-level Layers 4
Attention Heads 8
H-cycles 2
L-cycles 2
Max Halting Steps 16
Position Encoding RoPE (Rotary Position Embeddings)
Activation SwiGLU

Model Architecture and Objective

The Hierarchical Reasoning Model (HRM) features:

  1. Two-level Hierarchical Processing:

    • H-level (High-level): Performs slow, abstract planning and strategy formulation
    • L-level (Low-level): Executes fast, detailed computations
  2. Adaptive Computation Time (ACT):

    • Q-learning based halting mechanism
    • Dynamically determines when sufficient computation has been performed
    • Allows variable computational depth based on problem difficulty
  3. Recurrent Carry State:

    • Maintains H and L hidden states across reasoning cycles
    • Enables iterative refinement of solutions
  4. Positional Encoding:

    • RoPE (Rotary Position Embeddings) for position-aware attention
    • Supports up to 900 positions (30×30 grids)

Compute Infrastructure

Software

  • Framework: PyTorch with transformers library
  • Precision: bfloat16
  • Format: Safetensors

Citation

BibTeX:

@article{wang2025hierarchical,
  title={Hierarchical Reasoning Model},
  author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
  journal={arXiv preprint arXiv:2506.21734},
  year={2025}
}

APA:

Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.

More Information

This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-sudoku-extreme, formatted for use with the HuggingFace transformers library.

For more details about the HRM architecture and training methodology, see:

Model Card Contact

For questions or issues with this converted checkpoint, please open an issue in the transformers repository.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zbloss/HRM-sudoku-extreme