Platform - Model Serving + PPO

Customization Serving

In this session, our readings cover:

Required Readings:

Efficient Memory Management for Large Language Model Serving with PagedAttention

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

A Survey on Large Language Model Acceleration based on KV Cache Management

PPO readings

a simple blogpost: Preference Tuning LLMs: PPO, DPO, GRPO — A Simple Guide

A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

More reading:

Multiple system ML readings