Prompt Guard 2 Multitask ONNX
This repository contains an ONNX export of a multitask security classifier built from meta-llama/Llama-Prompt-Guard-2-86M.
The model was fine-tuned as a single multitask adapter on two security-focused tasks and then merged into a standalone model before ONNX export.
Base model
- Base model:
meta-llama/Llama-Prompt-Guard-2-86M - Architecture: sequence classification
- Export format: ONNX
- Primary runtime: ONNX Runtime / ONNX Runtime Mobile
Tasks
This model is intended to score text as BENIGN or MALICIOUS across two security-related input types:
- Phishing email detection
- Prompt injection detection
The model uses a shared binary label space:
BENIGNMALICIOUS
Training data
This multitask model was trained using data derived from:
naserabdullahalam/phishing-email-datasetmarycamilainfo/prompt-injection-malignant
Additional benign prompt-style examples were included so the prompt-injection side of the multitask classifier had both positive and negative examples.
Input format
During training and inference, inputs are prefixed with a simple modality tag:
[EMAIL] ...[PROMPT] ...
Example inputs
Phishing email example
[EMAIL] Subject: Verify your payroll account now. Body: Your payroll access will be suspended unless you confirm your credentials here: http://example-login-reset.com
- Downloads last month
- 18
Model tree for rudycaz/promptguard2-multitask-onnx
Base model
meta-llama/Llama-Prompt-Guard-2-86M