IrishCore-GlobalPointer-ContextPII-135M-v1-rc5
IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 is an expanded-label raw-only PII masking model for Irish public-sector and citizen-support flows.
It keeps the same DistilBERT-size GlobalPointer span extractor family as temsa/IrishCore-GlobalPointer-135M-v1-rc4, but adds contextual labels that the core model did not serve:
STREET_ADDRESSCITYCOUNTYDATE_OF_BIRTHAGE
The model still covers the core Irish structured labels:
PPSNPOSTCODEPHONE_NUMBEREMAILPASSPORT_NUMBERACCOUNT_NUMBERBANK_ROUTING_NUMBERSWIFT_BICCREDIT_DEBIT_CARDFIRST_NAMELAST_NAME
Positioning
rc5 is a decoder-hardening release over rc4: the weights and ONNX graph are unchanged, but the packaged decode policy now repairs contextual street-address and county spans that still leaked through in real Irish gov / HSE style text.
What changed in rc5:
- apartment / unit / house-name street-address prefixes are recovered into the full
STREET_ADDRESSspan - explicit
County ...,Contae ..., andgContae ...forms are recovered asCOUNTY - public-office street spans like
Allocation Centre: Kennedy Avenue, Carlow...are recovered cleanly - the added
globalpointer_context_redteam_v1suite is now1.0000on both full and q8 paths
A model-level continuation from the rc4 weights was trained locally, but it did not beat the released weights on the gated suites. The promoted rc5 therefore keeps the stronger rc4 weights and improves the bundled inference policy instead.
Use this release when you need broader masking for Irish gov / HSE / citizen-support text, including user turns and assistant answers that contain:
- personal address fragments
- city / county
- date of birth
- age
- official callback numbers or public-service mailbox emails that still need masking in assistant output
If you only need the narrower Irish-core structured label set and want maximum CPU throughput, temsa/IrishCore-GlobalPointer-135M-v1-rc4 remains the faster option.
Architecture
- base encoder:
OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1 - extractor head: GlobalPointer-style typed span matrix
- runtime: single-pass span extraction
- output policy: deterministic bracket masking, for example
[PII:PPSN] - deployment target: ONNX Runtime CPU with dynamic q8 per-channel quantization
This is not a generative model and does not rewrite text. It predicts typed spans and the provided inference scripts replace them with [PII:LABEL] placeholders.
Decoder Policy
The serving path is still raw-only in the sense that there is no external scanner/validator service. The repo does include bundled decoder repairs for highly structured spans:
- PPSN normalization and overlap repair
- passport cue-based repair
- contextual date-of-birth repair
- full-email recovery
- contextual phone extension
- Eircode recovery
- contextual street-address recovery
- explicit county phrase recovery
These repairs live inside common.py and are part of the published inference path.
Benchmarks
ONNX q8
| Suite | F1 | Examples/s |
|---|---|---|
| Irish core | 1.0000 | 84.4937 |
| Irish extended | 1.0000 | 104.2776 |
| Demographic holdout v2 | 1.0000 | 103.9646 |
| Gov contact policy v1 | 1.0000 | 67.0486 |
| Gov chatbot red-team v2 | 0.9861 | 88.9338 |
| Gov chatbot gap holdout v2 | 1.0000 | 76.5277 |
| Context red-team v1 | 1.0000 | 92.5221 |
| Multilingual PPSN overall | 0.9333 | 143.6130 |
| Multilingual PPSN label-only | 1.0000 | โ |
Comparison
| Model | Core F1 | Demographic Holdout v2 F1 | Gov Contact Policy v1 F1 | Context Red-Team v1 F1 | Multilingual F1 | Core examples/s | | --- | ---: | ---: | ---: | ---: | ---: | | ContextPII rc5 q8 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9333 | 84.4937 | | ContextPII rc4 q8 | 0.9935 | 1.0000 | 1.0000 | โ | 0.9333 | 61.4970 | | GlobalPointer rc4 q8 | 1.0000 | 0.6938 | 0.7843 | โ | 0.9333 | 221.5743 | | DiffMask rc6 q8 | 0.9733 | โ | โ | โ | 0.9274 | 130.3415 |
Tradeoff:
- this expanded-label line is materially better on contextual Irish masking tasks
- it is slower than the core-only GlobalPointer line on CPU because it carries a broader label inventory and more decoder work
Evaluation Notes
Additional q8 release checks shipped in this repo:
eval/q8_irish_numeric_qafix_v2.json: numeric false-positive guardrail suite, now1.0000F1eval/q8_irish_gov_chatbot_redteam_user_v2.json: user-turn split,1.0000F1eval/q8_irish_gov_chatbot_redteam_assistant_v2.json: assistant-turn split,0.9677F1eval/q8_globalpointer_context_redteam_v1.json: contextual hardening suite for apartment-prefix street addresses, explicit County/Contae forms, and public-office address blocks, now1.0000F1
The remaining assistant split miss is a policy mismatch on a public office address/city block: this contextual line still masks Kennedy Avenue, Carlow while the legacy assistant v2 gold only scores postcode and phone in that row.
globalpointer_demographic_patch_v2_testis the corrected held-out benchmark. The earlier v1 demographic patch contained invalid synthetic Eircodes in some rows.irish_gov_contact_policy_v1is the policy-aligned assistant-output benchmark for this release.globalpointer_context_redteam_v1is the new contextual hardening benchmark for the cases that previously leaked through in real apartment / county / public-office text.- The legacy
irish_gov_chatbot_redteam_v2negatives still assume some public assistant contact details should not be masked. That assumption does not match this release's target policy.
Usage
Full checkpoint:
python3 inference_mask.py --model temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 --text "I live at 14 Main Street, Dublin, Co. Dublin, D02 XY45 and my date of birth is 03/02/1991."
ONNX q8:
python3 inference_mask_onnx.py --model temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 --text "Employment Permit documents can be sent to EPStamp4@enterprise.gov.ie."
Expected masking style:
I live at [PII:STREET_ADDRESS], [PII:CITY], [PII:COUNTY], [PII:POSTCODE] and my date of birth is [PII:DATE_OF_BIRTH].
Files
model.safetensors: full checkpointonnx/model_quantized.onnx: recommended CPU artifactinference_mask.py: full-checkpoint inferenceinference_mask_onnx.py: ONNX q8 inferenceeval/benchmark_summary.json: machine-readable benchmark summarytraining_sources.json: data provenance
Limitations
- This release is Irish-first. Multilingual overall precision is still pulled down by extra name detections outside the primary Irish target domain.
- The decoder deliberately prefers recall on structured Irish identifiers. The remaining multilingual overall gap is mostly driven by valid person-name masking in benchmark rows that only score PPSN. If you need a stricter non-Irish name policy, test on your own corpora before promoting beyond
rc.
Portfolio Comparison
Updated: 2026-03-16.
Use this section for the fastest public comparison across the temsa PII masking portfolio.
- The first core table only includes public checkpoints that ship both comparable q8 accuracy and q8 CPU throughput.
- The first PPSN table only includes public artifacts that ship comparable PPSN accuracy and CPU throughput.
- Missing cells in the archive tables mean the older release did not ship that metric in its public bundle.
- DiffMask rows use the reconciled
clean_single_passharness that matches the deployed runtime. - GlobalPointer rows use the public raw-only span-matrix release bundle and its packaged q8 ONNX artifact.
- The same content is shipped as
PORTFOLIO_COMPARISON.mdinside each public model repo.
Irish Core PII: Comparable Public Checkpoints
| Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Q8 Core ex/s |
|---|---|---|---|---|---|
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc6 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 282.9 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc5 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 282.9 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 317.9 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 292.5 |
temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1 |
4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 337.3 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc29 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 232.7 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc28 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 232.7 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 212.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 278.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 237.6 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 106.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 150.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 181.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.2 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.2 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.6 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 94.1 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.8 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 128.9 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5 |
GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 84.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 |
temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1 |
GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 |
temsa/IrishCore-GlobalPointer-135M-v1-rc4 |
GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9333 | 221.6 |
temsa/IrishCore-GlobalPointer-135M-v1-rc3 |
GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9213 | 204.9 |
temsa/IrishCore-GlobalPointer-135M-v1-rc2 |
GlobalPointer raw-only span-matrix | 0.9934 | 0.9934 | 0.9326 | 231.2 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8 |
Raw-only token-span | 0.9737 | 0.9737 | 0.9176 | 46.1 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7 |
Hybrid classifier + generated scanner spec | 1.0000 | 0.9934 | 1.0000 | 30.0 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6 |
Hybrid classifier + repair decoders | 1.0000 | 0.9934 | 1.0000 | 29.5 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 |
Hybrid classifier + repair decoders | 0.9737 | 0.9669 | 0.9333 | 34.4 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4 |
Hybrid classifier + repair decoders | 0.9870 | 0.9740 | 0.9600 | 114.2 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3 |
Hybrid classifier + repair decoders | 0.9806 | 0.9677 | 0.9333 | 44.9 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2 |
Hybrid classifier + repair decoders | 0.9554 | 0.9615 | 0.7887 | 119.1 |
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1 |
Hybrid classifier baseline | 0.9530 | 0.9333 | 0.9882 | 103.3 |
temsa/IrishCore-DiffMask-135M-v1-rc6 |
DiffMask token-span, scanner-free | 0.9801 | 0.9733 | 0.9274 | 130.3 |
temsa/IrishCore-DiffMask-135M-v1-rc5 |
DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9379 | 249.2 |
temsa/IrishCore-DiffMask-135M-v1-rc4 |
DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9371 | 29.5 |
temsa/IrishCore-DiffMask-135M-v1-rc3 |
DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9591 | 30.0 |
temsa/IrishCore-DiffMask-135M-v1-rc2 |
DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9212 | 247.1 |
temsa/IrishCore-DiffMask-135M-v1-rc1 |
DiffMask token-span, scanner-free | 0.9801 | 0.9934 | 0.9412 | 251.2 |
Irish Core PII: Other Public Checkpoints
| Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Notes |
|---|---|---|---|---|---|
temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1 |
Hybrid classifier prototype | 0.9487 | โ | โ | Predates the public q8 artifact. |
Finance-boundary q8 F1 is 1.0000 for OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7, OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8, and all public IrishCore-DiffMask releases from rc1 to rc6. OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5 ships 0.8750 on that public q8 suite.
PPSN-Only: Comparable Public Artifacts
| Repo | Artifact | Irish Large F1 | Multilingual PPSN F1 | User Raw F1 | QA v8 F1 | CPU ex/s |
|---|---|---|---|---|---|---|
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1 |
fp32 canonical checkpoint | 0.8979 | 0.9704 | 0.8000 | 0.7385 | 57.4 |
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16 |
fp16 CPU/GPU artifact | โ | 0.9704 | 0.8000 | 0.7385 | 45.8 |
temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8 |
dynamic int8 CPU artifact | โ | 0.9040 | โ | โ | 132.1 |
PPSN-Only: Historical Public Checkpoints
| Repo | Main Published Metrics | Notes |
|---|---|---|
temsa/OpenMed-PPSN-mLiteClinical-v1 |
same as canonical fp32 repo: multilingual 0.9704, user raw 0.8000 | Legacy alias; prefer temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1. |
temsa/OpenMed-PPSN-v6-raw-rc2 |
irish_reg_v5 0.8750; user_raw 0.8000; qa_v8 0.7385 | Raw PPSN-only research checkpoint; no packaged multilingual CPU benchmark row. |
temsa/OpenMed-PPSN-v5_1 |
irish_large_v2 raw 0.9285; qa_v6 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. |
temsa/OpenMed-PPSN-v5 |
irish_reg_v5 raw 0.8235; irish_reg_v5 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. |
temsa/OpenMed-PPSN-v4 |
synthetic non-PPSN drift check only | Predates the current PPSN eval suite; no packaged apples-to-apples multilingual CPU row. |
If you need the strongest current raw-only Irish core model, start with IrishCore-GlobalPointer-135M-v1-rc4. If you need the fastest CPU-first raw-only line, compare it against IrishCore-DiffMask-135M-v1-rc6. If you need a PPSN-only artifact, compare the canonical fp32, fp16, and q8 variants of OpenMed-mLiteClinical-IrishPPSN-135M-v1 directly in the table above.
- Downloads last month
- 280