RavenBERT

RavenBERT is a SentenceTransformers embedding model specialized for smart-contract invariants (e.g., require(...), assert(...), if (...) revert) from Ethereum/Vyper sources.
It starts from web3se/SmartBERT-v2 and is contrastively fine-tuned so that cosine similarity reflects semantic intent of guards used in transaction-reverting checks.

  • Architecture: BERT-family encoder (SmartBERT-v2) β†’ MeanPooling β†’ L2 Normalize
  • Embedding dimension: 768
  • Normalization: Enabled (unit-norm vectors; cosine ≑ dot product)
  • Intended use: clustering / semantic search / dedup / taxonomy building for short guard predicates (and optional messages)

Quick start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("MojtabaEshghie/RavenBERT")
sentences = [
    "amountOut >= amountOutMin",
    "deadline >= block.timestamp",
    "balances[msg.sender] >= amount"
]
emb = model.encode(sentences, convert_to_numpy=True, show_progress_bar=False)
# emb are L2-normalized; use cosine similarity for comparisons

Training summary (contrastive)

  • Base model: web3se/SmartBERT-v2
  • Objective: CosineSimilarityLoss (positives near 1.0, negatives near 0.0)
  • Pair construction: L2-normalized seed embeddings β†’ positives if cosine β‰₯ 0.80, negatives if cosine ≀ 0.20 (nearest-neighbor candidates, top_k=10, max 5 positives/item)
  • This release stats: 1,647 unique texts β†’ 16,470 pairs (8,235 pos / 8,235 neg)
  • Hyperparams: epochs=1, batch_size=16, max_seq_len=512
  • Saved as: canonical SentenceTransformers layout (0_Transformer/, 1_Pooling/, 2_Normalize/)

A more detailed methodology and evaluation appear in the RAVEN paper (semantic clustering of revert-inducing invariants).

Intended uses & limitations

Good for

  • Measuring semantic relatedness of short invariant predicates
  • Clustering guards by intent (e.g., access control, slippage, timeouts)
  • Deduplicating near-equivalent checks across contracts

Not ideal for

  • Long code blocks or whole-function embeddings
  • General code understanding outside invariant-style snippets
  • Non-EVM ecosystems without adaptation

Evaluation (paper)

When paired with DBSCAN on predicate-only text, RavenBERT produced compact, well-separated clusters (e.g., Silhouette β‰ˆ 0.93, S_Dbw β‰ˆ 0.043 at ~52% coverage), surfacing meaningful categories of defenses from reverted transactions. See paper for full protocol, ablations, and metrics.

Reproducibility

  • Pair thresholds: Ο„β‚Š = 0.80, Ο„β‚‹ = 0.20
  • Normalization: L2 via sentence_transformers.models.Normalize()
  • Training log: ravenbert_training_stats.json (included in repo)

Citation

If you use RavenBERT, please cite the RAVEN paper and this model:

TBD

License

MIT

Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for software-ses/RavenBERT

Finetuned
(1)
this model

Evaluation results