Prompt Guard 2 Multitask ONNX

This repository contains an ONNX export of a multitask security classifier built from meta-llama/Llama-Prompt-Guard-2-86M.

The model was fine-tuned as a single multitask adapter on two security-focused tasks and then merged into a standalone model before ONNX export.

Base model

  • Base model: meta-llama/Llama-Prompt-Guard-2-86M
  • Architecture: sequence classification
  • Export format: ONNX
  • Primary runtime: ONNX Runtime / ONNX Runtime Mobile

Tasks

This model is intended to score text as BENIGN or MALICIOUS across two security-related input types:

  1. Phishing email detection
  2. Prompt injection detection

The model uses a shared binary label space:

  • BENIGN
  • MALICIOUS

Training data

This multitask model was trained using data derived from:

  • naserabdullahalam/phishing-email-dataset
  • marycamilainfo/prompt-injection-malignant

Additional benign prompt-style examples were included so the prompt-injection side of the multitask classifier had both positive and negative examples.

Input format

During training and inference, inputs are prefixed with a simple modality tag:

  • [EMAIL] ...
  • [PROMPT] ...

Example inputs

Phishing email example

[EMAIL] Subject: Verify your payroll account now. Body: Your payroll access will be suspended unless you confirm your credentials here: http://example-login-reset.com
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rudycaz/promptguard2-multitask-onnx

Finetuned
(6)
this model