Agent Alignment - PPO

Customization Serving

In this session, our readings cover:

Required Model serving Readings:

PPO Readings:

a simple blogpost: Preference Tuning LLMs: PPO, DPO, GRPO — A Simple Guide

A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

More Readings:

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning