AI Security Framework Crosswalk β Ensemble Classifier (vfinal)
Model Description
This model is a 3-model ensemble for ordinal tier classification of AI security control pairs across multiple frameworks. Given a pair of security controls (one from each framework), it assigns one of four ordinal relationship tiers:
| Tier | Meaning |
|---|---|
| UNRELATED | No meaningful overlap |
| PARTIAL | Some topical overlap but different scope or depth |
| RELATED | Substantial overlap; controls address similar threats |
| EQUIVALENT | Near-identical coverage; either control satisfies the other |
Built by Rock Lambros at the University of Denver as part of dissertation research on AI security framework alignment.
Intended Use
Primary use: Scoring cross-framework control mappings for AI security frameworks (NIST AI RMF, OWASP AI, MITRE ATLAS, ISO/IEC 42001, EU AI Act, ENISA, NIST SP 800-218A, NIST CSF 2.0, and others).
Out of scope: General natural language inference (NLI), non-security domains, or any application where automated output replaces human judgment without review.
Model Architecture
The ensemble combines three transformer-based encoders, each with a 4-class linear classification head:
| Component | Base Model | Parameters | Embedding Dim |
|---|---|---|---|
| RoBERTa-large | roberta-large |
355M | 1024-dim CLS |
| DeBERTa-v3-base | microsoft/deberta-v3-base |
86M | 768-dim CLS |
| BGE-large-v1.5 | BAAI/bge-large-en-v1.5 |
335M | 1024-dim CLS |
Combination strategy: Softmax averaging of the three models' output probability distributions. No learnable parameters in the combination layer β purely post-hoc ensemble.
Each head is a single linear layer (Linear(hidden_size, 4)) trained independently per model.
Training Details
Dataset: 5,920 expert-labeled control pairs drawn from 9 AI security frameworks. Labels were assigned by Rock Lambros using a structured annotation rubric.
Data cleaning: Mapping-level deduplication removed 56% contamination (cross-split leakage) from the raw dataset before train/val/test splits were finalized.
Loss functions: Ordinal-aware losses used during training:
- KL divergence against soft ordinal label distributions
- CORN (conditional ordinal ranking net) loss
- Focal loss with class-balanced weighting
Compute: 3x NVIDIA H100 80GB SXM GPUs, BF16 mixed precision, ~4 hours total wall-clock time.
Hyperparameters: AdamW optimizer, linear warmup + cosine decay, per-model learning rates tuned via Optuna (tracked in Sacred).
Evaluation Results
Overall Metrics (179-pair test set)
| Metric | Value |
|---|---|
| Exact Accuracy | 79.9% |
| Adjacent Accuracy (Β±1 tier) | 92.2% |
| Macro F1 | 0.558 |
Per-Class F1 Scores
| Class | F1 | Support |
|---|---|---|
| UNRELATED | 0.928 | ~87 |
| PARTIAL | 0.526 | ~55 |
| RELATED | 0.378 | ~30 |
| EQUIVALENT | 0.400 | 7 |
Conformal Prediction Coverage
Conformal prediction sets (calibrated at 90% target coverage) achieve >90% empirical coverage for all four classes on the held-out test set.
Limitations and Biases
- Small test set: 179 pairs total, with only 7 EQUIVALENT examples. Per-class metrics for EQUIVALENT are high-variance.
- English-only: All frameworks and controls are in English; no multilingual support.
- Framework coverage: Trained on 9 specific AI security frameworks. Performance on out-of-distribution frameworks (e.g., sector-specific CISA guidance) is unknown.
- Expert labeler bias: A single expert (Rock Lambros) labeled all training data; inter-annotator agreement was not formally measured.
- Ordinal collapsing: The PARTIAL/RELATED boundary is the hardest to learn (lowest F1s), reflecting genuine annotation ambiguity in the middle tiers.
Ethical Considerations
- No personal data of any kind was used in training or evaluation.
- All training data consists of publicly available security framework control text.
- This is a security-domain tool; outputs should be treated as advisory scores requiring human review before use in compliance or procurement decisions.
- The model does not generate free text and cannot be used for content generation or harmful repurposing.
Environmental Impact
| Resource | Value |
|---|---|
| GPUs | 3x NVIDIA H100 80GB SXM |
| Estimated TDP per GPU | ~700W |
| Training wall time | ~4 hours |
| Estimated energy | ~8.4 kWh |
| Estimated CO2e | ~3.4 kg (US average grid) |
Estimate assumes 400g CO2/kWh US average grid intensity. Actual emissions may differ based on datacenter location and energy mix.
How to Use
import torch
from transformers import AutoTokenizer, AutoModel
# Load one encoder (RoBERTa shown; repeat for deberta_base and bge)
encoder = AutoModel.from_pretrained("rockCO78/ai-security-crosswalk-vfinal/roberta/encoder")
tokenizer = AutoTokenizer.from_pretrained("rockCO78/ai-security-crosswalk-vfinal/roberta/encoder")
# Load the classification head
head_state = torch.load("roberta/head.pt", map_location="cpu")
head = torch.nn.Linear(1024, 4)
head.load_state_dict(head_state)
head.eval()
encoder.eval()
LABELS = ["UNRELATED", "PARTIAL", "RELATED", "EQUIVALENT"]
def predict_pair(control_a: str, control_b: str) -> dict:
"""Predict ordinal relationship tier for a control pair."""
text = f"{control_a} [SEP] {control_b}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
hidden = encoder(**inputs).last_hidden_state[:, 0, :] # CLS token
logits = head(hidden)
probs = torch.softmax(logits, dim=-1).squeeze()
return {label: round(prob.item(), 4) for label, prob in zip(LABELS, probs)}
# Example usage
result = predict_pair(
"Implement data minimization for AI training datasets",
"Limit collection of personal data to what is necessary for the stated purpose"
)
print(result)
# For full ensemble, average softmax outputs from all three models
How to Train on New Frameworks
To fine-tune or extend this model on additional AI security frameworks:
- Collect control pairs. Enumerate all cross-framework control combinations (or sample strategically using embedding similarity pre-filtering).
- Label using the rubric. Assign UNRELATED / PARTIAL / RELATED / EQUIVALENT using the annotation guide in
scripts/(seepredict_edges.pyfor label definitions). - Deduplicate at mapping level. Remove any pairs where the same mapping appears in both train and test splits to prevent leakage.
- Fine-tune each encoder independently. Use ordinal losses (KL soft-label, CORN, or focal) rather than standard cross-entropy to preserve tier ordering.
- Evaluate with adjacent accuracy. Exact accuracy understates model quality for ordinal tasks; report adjacent accuracy (Β±1 tier) alongside macro F1.
Citation
@misc{lambros2026crosswalk,
author = {Lambros, Rock},
title = {AI Security Framework Crosswalk: Ordinal Classification of Control Relationships},
year = {2026},
institution = {University of Denver},
howpublished = {\url{https://huggingface.co/rockCO78/ai-security-crosswalk-vfinal}},
note = {Dissertation research; 3-model ensemble for ordinal tier classification across AI security frameworks}
}
Safety and Risk Assessment
Outputs are advisory scores, not authoritative compliance determinations.
- The model assigns probabilistic tiers; human expert review is required before using predictions to inform compliance decisions, procurement, or framework adoption.
- Tier predictions should be validated against primary framework documentation before any downstream use.
- The model has no knowledge of organizational context, implementation details, or regulatory jurisdiction β factors that are essential for real compliance assessment.
- Users operating in regulated environments (finance, healthcare, critical infrastructure) must apply additional human review commensurate with their risk tolerance.
Evaluation results
- Macro F1 on AI Security Framework Crosswalktest set self-reported0.558
- Exact Accuracy on AI Security Framework Crosswalktest set self-reported0.799