Instructions to use SovereignSystems-cc/sosa-pii-ner-sg-v1.0.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use SovereignSystems-cc/sosa-pii-ner-sg-v1.0.0 with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("SovereignSystems-cc/sosa-pii-ner-sg-v1.0.0") - Notebooks
- Google Colab
- Kaggle
sosa-pii-ner-sg-v1.0.0
Regional PII named entity recognition model for Singapore documents. Part of the SOSA DevOps Privacy Filter β a local-first, privacy-preserving AI runtime for developers. Weights are Apache 2.0. No cloud required.
π Source code: SOSA DevOps on GitHub π Product: sovereignsystems.cc
Model summary
Fine-tuned from urchade/gliner_large-v2.1 on synthetic Singapore PII data.
Detects NRIC/FIN numbers, Unique Entity Numbers (UEN), Singapore local phone
numbers, and healthcare reference numbers (HRN) in English documents.
Intended use: Local PII detection within the SOSA DevOps Privacy Filter sidecar. Text never leaves the user's machine.
Labels
| Label | Description | Format | Validator |
|---|---|---|---|
sg_nric_fin |
Singapore NRIC / FIN | Letter + 7 digits + check letter (S/T/F/G/M) | Prefix + check character |
sg_uen |
Unique Entity Number | 9β10 chars (numeric + alpha suffix, or T-prefix) | Format validation |
sg_phone_local |
Singapore local phone | 8 digits, starting 3/6/8/9 | Prefix validation |
sg_health_hrn |
Healthcare reference number | Alphanumeric 10β21 chars | Format validation |
Global labels also carried (defence-in-depth):
email, phone_e164, credit_card, passport_generic, ipv4_public
Evaluation β v1.0.0 gate results
| Label | F1 | Gate |
|---|---|---|
sg_nric_fin |
0.9524 | β₯ 0.85 β |
sg_uen |
0.7143 | β₯ 0.70 β |
First-run gate pass (T1). Training: D-SG-1 dataset (1,157 positive examples), 10,000 steps, A40 GPU, 2026-05-28/29.
Limitations
- NRIC/UEN alphanumeric collision: Both labels use alphanumeric formats. Context (NRIC/FIN identity keywords vs UEN/company registration keywords) disambiguates.
- UEN precision: F1=0.7143 reflects limited UEN training examples. Adequate for production defence-in-depth; higher recall on NRIC is the primary detection surface.
- Context-gated: Bare values without surrounding context are unreliable.
- Language: English. Singapore-specific context templates (healthcare, HR, business registration).
Training data
Synthetic Singapore PII examples only. No real resident or patient data used.
Integrity
pytorch_model.bin SHA-256:
070db3ae5b22c7f40cf8037a604262f58a404d17bbf79dd69a26fe9f98edd91f
License
Apache 2.0 β inherited from urchade/gliner_large-v2.1.
Fine-tuned by Sovereign Systems.
See LICENSE.
- Downloads last month
- 59
Model tree for SovereignSystems-cc/sosa-pii-ner-sg-v1.0.0
Base model
urchade/gliner_large-v2.1