OpenMed PII SuperClinical Small — ONNX

ONNX export of OpenMed/OpenMed-PII-SuperClinical-Small-44M-v1 — format-conversion derivative under Apache 2.0; no weights or labels modified. See the upstream model card for training data, intended use, and limitations.

Why an ONNX export?

The upstream model ships PyTorch weights. ONNX enables:

  • Inference outside the Python/PyTorch stack (Rust, Go, C++, .NET, mobile, browser via onnxruntime-web).
  • Embedding in self-contained binaries without a transformers/torch runtime dependency.
  • A smaller deployment footprint via FP16 (270 MB vs 566 MB FP32) with no observed loss in PII span extraction on the verification battery.

Model details

  • Architecture: DebertaV2ForTokenClassification (DeBERTa-v3-small backbone, ~44M parameters)
  • Task: Token classification for PII / PHI detection in clinical text
  • Format: ONNX (FP32 reference + FP16 production variant)
  • Max sequence length: 512
  • Tokenizer: SentencePiece (DeBERTa-v3 unigram, vocab 128k); fast-tokenizer tokenizer.json included
  • Number of labels: 106 (BIO scheme — 54 entity types: 51 with both B- and I-, 3 with B- only, plus O)

Labels

54 entity types, BIO-tagged:

Category Entity types
Identity first_name, last_name, user_name, gender, age, race_ethnicity, religious_belief, sexuality, political_view
Contact email, phone_number, fax_number, street_address, city, county, state, country, postcode, coordinate
Government / financial IDs ssn, tax_id, account_number, bank_routing_number, swift_bic, credit_debit_card, cvv, pin, customer_id, unique_id
Healthcare-specific medical_record_number, health_plan_beneficiary_number, blood_type
Credentials / device password, api_key, http_cookie, mac_address, ipv4, ipv6, url, device_identifier, biometric_identifier
Documents / vehicles certificate_license_number, license_plate, vehicle_identifier, employee_id
Demographics occupation, education_level, employment_status, language, company_name
Temporal date, date_of_birth, date_time, time

O is the outside-any-entity label. See config.json for the canonical 106-entry id2label map.

Usage

Python (optimum + transformers)

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForTokenClassification

REPO = "sidupadhyay/OpenMed-PII-SuperClinical-Small-44M-v1-ONNX"

tokenizer = AutoTokenizer.from_pretrained(REPO)

# FP16 (production, ~270 MB):
model = ORTModelForTokenClassification.from_pretrained(REPO, file_name="model_fp16.onnx")
# Or FP32 (accuracy reference, ~566 MB):
# model = ORTModelForTokenClassification.from_pretrained(REPO, file_name="model.onnx")

inputs = tokenizer(
    "The patient's date of birth is March 15, 1982 and her SSN is 123-45-6789.",
    return_tensors="pt",
)
outputs = model(**inputs)
predicted_label_ids = outputs.logits.argmax(-1)[0].tolist()
predicted_labels = [model.config.id2label[i] for i in predicted_label_ids]

Direct onnxruntime + tokenizers (no transformers / torch)

import onnxruntime as ort
from tokenizers import Tokenizer

tok = Tokenizer.from_file("tokenizer.json")
sess = ort.InferenceSession("model_fp16.onnx", providers=["CPUExecutionProvider"])

enc = tok.encode("Patient John Doe's SSN is 123-45-6789.")
input_ids = [enc.ids]
attention_mask = [[1] * len(enc.ids)]

logits, = sess.run(["logits"], {"input_ids": input_ids, "attention_mask": attention_mask})

The fast tokenizer ships with character offsets (offsets) — use them with the per-token argmax to extract PII spans for redaction or annotation.

Original model

A per-record training-data inventory has not been published; consult the paper and contact the upstream authors for training-data due diligence.

Conversion

  • Source: OpenMed/OpenMed-PII-SuperClinical-Small-44M-v1 (Apache 2.0)
  • Date: 2026-04-27
  • Tooling: optimum[onnxruntime]==2.1.0, transformers==4.57.6, onnx==1.21.0, onnxruntime==1.25.0, torch==2.11.0, onnxconverter-common==1.16.0. Python 3.11.

FP32 (model.onnx):

from optimum.onnxruntime import ORTModelForTokenClassification
m = ORTModelForTokenClassification.from_pretrained(SRC, export=True)
m.save_pretrained(OUT)

FP16 (model_fp16.onnx):

import onnx
from onnxconverter_common import float16
g = onnx.load("model.onnx")
g_fp16 = float16.convert_float_to_float16(g, keep_io_types=True)
onnx.save(g_fp16, "model_fp16.onnx")

keep_io_types=True preserves int64 input_ids / attention_mask inputs and the FP32 logits output, so tokenizer-side and downstream-consumer code paths are unchanged. No op_block_list entries were required.

A small post-processing pass aligns Cast node to attributes with the converter's updated value_info dtypes — onnxconverter_common updates downstream tensor types but doesn't always rewrite the to attribute on existing Cast nodes. 40 nodes (including casts inside an If subgraph used by DeBERTa's relative-position bucket logic) needed patching to match their declared output dtype, otherwise onnxruntime rejects the graph at session creation. The fix is a graph walk over model.graph.node (and recursively over If / Loop subgraphs in attribute.g) that, for each Cast node, sets the to attribute to the dtype declared on the node's output value_info. This is a known sharp edge with onnxconverter_common on transformer graphs containing explicit Cast nodes.

The exact conversion + verification script ships in this repo as verify.py.

License: Apache 2.0, inherited from the upstream model. Format conversion is permitted; no weights are altered.

Verification

verify.py reproduces the conversion-verification battery: 20 synthetic PII-containing sentences (no real personal data) compared against the upstream PyTorch model.

Metric FP32 threshold FP32 result FP16 threshold FP16 result
Max absolute logit drift (per token) < 1e-4 2.10e-05 informational 5.86e-03
Per-token argmax disagreement 0% 0 / 299 tokens < 0.5% 0 / 299 tokens
BIO span agreement (per sample) 100% 20 / 20 samples 100% 20 / 20 samples

The FP32 export is lossless within fp32 numerical noise. The FP16 variant matches FP32 argmax decisions on every token in the battery, so all extracted PII spans are identical; logit drift is the expected ~1e-3 magnitude for half-precision and does not cross any decision boundary on this battery.

python verify.py                                       # default: any model.onnx / model_fp16.onnx siblings
python verify.py --model model.onnx --model model_fp16.onnx   # explicit

INT8 quantization (not shipped)

Dynamic INT8 quantization (onnxruntime.quantization.quantize_dynamic, QuantType.QInt8) was attempted to produce a smaller deployment artifact. Results on the same 20-sample battery:

Metric INT8 (all ops) INT8 (MatMul-only) INT8 acceptance threshold
Max absolute logit drift 5.48e+00 4.24e+00 (informational)
Per-token argmax disagreement 10.37% 10.14% < 5%
Span agreement 9 / 20 4 / 20 20 / 20

Both dynamic INT8 variants exceeded the argmax-disagreement budget by 2× and broke span-level predictions on roughly half the samples. DeBERTa-v3's heavy-tailed activation distributions are known to defeat dynamic per-tensor scales. The quantized artifact is not shipped. The FP16 variant achieves a comparable size reduction (270 MB) without the accuracy loss.

If a smaller artifact than FP16 is required, viable next steps are (a) static (calibration-based) INT8 with a representative PII corpus, or (b) QuantType.QUInt8 with QDQ pre-processing.

File inventory

File Size Purpose
model.onnx 566 MB FP32 ONNX graph + weights (accuracy reference)
model.onnx.sha256 77 B Integrity hash for model.onnx
model_fp16.onnx 270 MB FP16 ONNX graph (production variant; FP32 I/O preserved)
model_fp16.onnx.sha256 81 B Integrity hash for model_fp16.onnx
config.json 6.3 KB HF model config (architecture, id2label, label2id)
tokenizer.json 8.6 MB Fast tokenizer (Rust-compatible, with offsets)
tokenizer_config.json 1.4 KB Tokenizer settings
special_tokens_map.json 970 B Special-token map ([CLS], [SEP], [PAD], [MASK], [UNK])
spm.model 2.4 MB SentencePiece model (slow tokenizer / interop)
added_tokens.json 23 B Added-token map
LICENSE 11 KB Apache License 2.0
verify.py 9 KB Reproducible conversion-verification script

After download, sha256sum -c model.onnx.sha256 and sha256sum -c model_fp16.onnx.sha256 should both pass.

Citation

Please cite the upstream work — see the model card and arXiv:2508.01630 for the canonical citation.

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sidupadhyay/OpenMed-PII-SuperClinical-Small-44M-v1-ONNX

Paper for sidupadhyay/OpenMed-PII-SuperClinical-Small-44M-v1-ONNX