HRM Sudoku Extreme

A Hierarchical Reasoning Model (HRM) trained to solve extreme difficulty Sudoku puzzles using hierarchical processing and adaptive computation.

Model Details

Model Description

This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving extreme difficulty Sudoku puzzles. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract planning and low-level (L) modules for detailed computation. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.

The model processes 9×9 Sudoku grids (81 tokens) and predicts the correct digit for each cell through hierarchical reasoning cycles.

Developed by: Sapient Inc.
Model type: Hierarchical Reasoning Model (HRM)
Language(s): Symbolic reasoning (digits 0-9)
License: Apache 2.0
Original checkpoint: sapientinc/HRM-checkpoint-sudoku-extreme

Model Sources

Repository: transformers
Paper: Hierarchical Reasoning Model
Original Repository: HRM GitHub

Uses

Direct Use

This model is designed for solving extreme difficulty Sudoku puzzles. It can:

Solve complex 9×9 Sudoku grids that require advanced reasoning techniques
Process partial grids and predict missing digits
Demonstrate hierarchical reasoning strategies for constraint satisfaction problems

Downstream Use

The model can be used as:

A component in puzzle-solving applications
A baseline for research in hierarchical reasoning and adaptive computation
An example of applying neural networks to combinatorial optimization problems

Recommendations

Users should be aware that:

The model is specialized for Sudoku and should not be used for general reasoning tasks
Input must be properly formatted as 9×9 grids with digits 0-9 (0 for empty cells)
Inference time may vary due to the adaptive computation mechanism

How to Get Started with the Model

import torch
from transformers import HrmForCausalLM

# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-sudoku-extreme")
model.eval()

# Prepare a Sudoku grid (9x9 = 81 tokens)
# 0 represents empty cells, 1-9 are the digits
sudoku_grid = torch.randint(0, 11, (1, 81))  # Example random grid
puzzle_ids = torch.zeros(1, dtype=torch.long)

# Run inference
with torch.no_grad():
    outputs = model(input_ids=sudoku_grid, puzzle_identifiers=puzzle_ids)

# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted solution: {predictions}")

Training Details

Training Data

The model was trained on a dataset of extreme difficulty Sudoku puzzles. These puzzles require advanced solving techniques beyond basic constraint propagation.

Training Procedure

The model uses a hierarchical architecture with:

High-level (H) module: 4 transformer layers for abstract planning
Low-level (L) module: 4 transformer layers for detailed computation
H-cycles: 2 high-level reasoning cycles
L-cycles: 2 low-level computation cycles per H-cycle
ACT mechanism: Q-learning based adaptive halting with max 16 steps

Training Hyperparameters

Training regime: bfloat16 mixed precision
Architecture: 4 H-layers, 4 L-layers, 8 attention heads
Hidden size: 512
Intermediate size: 1536
Max position embeddings: 900
Vocabulary size: 11 (digits 0-9 + padding)

Model Architecture

Technical Specifications

Component	Value
Total Parameters	27,275,778 (27.3M)
Model Size	109.11 MB
Vocabulary Size	11
Hidden Size	512
Intermediate Size	1536
H-level Layers	4
L-level Layers	4
Attention Heads	8
H-cycles	2
L-cycles	2
Max Halting Steps	16
Position Encoding	RoPE (Rotary Position Embeddings)
Activation	SwiGLU

Model Architecture and Objective

The Hierarchical Reasoning Model (HRM) features:

Two-level Hierarchical Processing:
- H-level (High-level): Performs slow, abstract planning and strategy formulation
- L-level (Low-level): Executes fast, detailed computations
Adaptive Computation Time (ACT):
- Q-learning based halting mechanism
- Dynamically determines when sufficient computation has been performed
- Allows variable computational depth based on problem difficulty
Recurrent Carry State:
- Maintains H and L hidden states across reasoning cycles
- Enables iterative refinement of solutions
Positional Encoding:
- RoPE (Rotary Position Embeddings) for position-aware attention
- Supports up to 900 positions (30×30 grids)

Compute Infrastructure

Software

Framework: PyTorch with transformers library
Precision: bfloat16
Format: Safetensors

Citation

BibTeX:

@article{wang2025hierarchical,
  title={Hierarchical Reasoning Model},
  author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
  journal={arXiv preprint arXiv:2506.21734},
  year={2025}
}

APA:

Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.

More Information

This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-sudoku-extreme, formatted for use with the HuggingFace transformers library.

For more details about the HRM architecture and training methodology, see:

Paper: https://arxiv.org/abs/2506.21734
Original Implementation: https://github.com/sapientinc/HRM

Model Card Contact

For questions or issues with this converted checkpoint, please open an issue in the transformers repository.

Downloads last month: 2

Paper for zbloss/HRM-sudoku-extreme

Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 47