Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ZDCSlab 's Collections
Rubrics as an Attack Surface (RIPD)

Rubrics as an Attack Surface (RIPD)

updated 8 days ago

This collection releases the official artifacts accompanying “Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges.”

Upvote
-

  • Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

    Paper • 2602.13576 • Published 14 days ago • 2

  • ZDCSlab/ripd-dataset

    Preview • Updated 6 days ago • 38

  • ZDCSlab/ripd-ultra-real-llama3-8b-instruct-biased-bt

    Text Generation • Updated 6 days ago • 5

  • ZDCSlab/ripd-ultra-real-llama3-8b-instruct-seed-bt

    Text Generation • Updated 6 days ago • 8

  • ZDCSlab/ripd-anthropic-saferlhf-dolphin3-llama31-8b-biased-bt

    Text Generation • Updated 6 days ago • 6

  • ZDCSlab/ripd-anthropic-saferlhf-dolphin3-llama31-8b-seed-bt

    Text Generation • Updated 6 days ago • 8

  • ZDCSlab/ripd-ultra-real-gemma2-2b-it-biased-bt

    Text Generation • 3B • Updated 6 days ago • 14

  • ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

    Text Generation • 3B • Updated 6 days ago • 11

  • ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-biased-bt

    Text Generation • 3B • Updated 7 days ago • 5

  • ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-seed-bt

    Text Generation • 3B • Updated 7 days ago • 7
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs