stereoplegic 's Collections Adversarial
updated
LTD: Low Temperature Distillation for Robust Adversarial Training
Paper
• 2111.02331
• Published
• 1
Interpolated Adversarial Training: Achieving Robust Neural Networks
without Sacrificing Too Much Accuracy
Paper
• 1906.06784
• Published
• 1
Pruning Adversarially Robust Neural Networks without Adversarial
Examples
Paper
• 2210.04311
• Published
• 1
Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher
Adversarial Distillation
Paper
• 2306.16170
• Published
• 1
Mutual Adversarial Training: Learning together is better than going
alone
Paper
• 2112.05005
• Published
• 1
Towards Adversarially Robust Continual Learning
Paper
• 2303.17764
• Published
• 1
Privacy-Preserving Prompt Tuning for Large Language Model Services
Paper
• 2305.06212
• Published
• 1
Fine-tuning Aligned Language Models Compromises Safety, Even When Users
Do Not Intend To!
Paper
• 2310.03693
• Published
• 1
Red-Teaming Large Language Models using Chain of Utterances for
Safety-Alignment
Paper
• 2308.09662
• Published
• 3
PromptBench: Towards Evaluating the Robustness of Large Language Models
on Adversarial Prompts
Paper
• 2306.04528
• Published
• 3
On the Adversarial Robustness of Mixture of Experts
Paper
• 2210.10253
• Published
• 1
CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming
Language Models
Paper
• 2206.00052
• Published
• 1
Fake Alignment: Are LLMs Really Aligned Well?
Paper
• 2311.05915
• Published
• 2
Frontier Language Models are not Robust to Adversarial Arithmetic, or
"What do I need to say so you agree 2+2=5?
Paper
• 2311.07587
• Published
• 5