sosa-pii-ner-sg-v1.0.0

Regional PII named entity recognition model for Singapore documents. Part of the SOSA DevOps Privacy Filter β€” a local-first, privacy-preserving AI runtime for developers. Weights are Apache 2.0. No cloud required.

πŸ”— Source code: SOSA DevOps on GitHub 🌐 Product: sovereignsystems.cc


Model summary

Fine-tuned from urchade/gliner_large-v2.1 on synthetic Singapore PII data. Detects NRIC/FIN numbers, Unique Entity Numbers (UEN), Singapore local phone numbers, and healthcare reference numbers (HRN) in English documents.

Intended use: Local PII detection within the SOSA DevOps Privacy Filter sidecar. Text never leaves the user's machine.


Labels

Label Description Format Validator
sg_nric_fin Singapore NRIC / FIN Letter + 7 digits + check letter (S/T/F/G/M) Prefix + check character
sg_uen Unique Entity Number 9–10 chars (numeric + alpha suffix, or T-prefix) Format validation
sg_phone_local Singapore local phone 8 digits, starting 3/6/8/9 Prefix validation
sg_health_hrn Healthcare reference number Alphanumeric 10–21 chars Format validation

Global labels also carried (defence-in-depth): email, phone_e164, credit_card, passport_generic, ipv4_public


Evaluation β€” v1.0.0 gate results

Label F1 Gate
sg_nric_fin 0.9524 β‰₯ 0.85 βœ…
sg_uen 0.7143 β‰₯ 0.70 βœ…

First-run gate pass (T1). Training: D-SG-1 dataset (1,157 positive examples), 10,000 steps, A40 GPU, 2026-05-28/29.


Limitations

  • NRIC/UEN alphanumeric collision: Both labels use alphanumeric formats. Context (NRIC/FIN identity keywords vs UEN/company registration keywords) disambiguates.
  • UEN precision: F1=0.7143 reflects limited UEN training examples. Adequate for production defence-in-depth; higher recall on NRIC is the primary detection surface.
  • Context-gated: Bare values without surrounding context are unreliable.
  • Language: English. Singapore-specific context templates (healthcare, HR, business registration).

Training data

Synthetic Singapore PII examples only. No real resident or patient data used.


Integrity

pytorch_model.bin SHA-256:

070db3ae5b22c7f40cf8037a604262f58a404d17bbf79dd69a26fe9f98edd91f

License

Apache 2.0 β€” inherited from urchade/gliner_large-v2.1. Fine-tuned by Sovereign Systems. See LICENSE.

Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SovereignSystems-cc/sosa-pii-ner-sg-v1.0.0

Finetuned
(8)
this model