Platform - Model Jailbreaking / Safeguarding

Jailbreaking Safeguarding

In this session, our readings cover:

Required Readings:

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

More Readings:

Auditing Prompt Caching in Language Model APIs

New GenAI simulation and evaluation tools in Azure AI Studio

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

Beyond Benchmarks: On The False Promise of AI Regulation