Posts by Year

2025

Agent Safety

9 minute read

In this session, our readings cover:

Back to Top ↑

2024

Privacy

1 minute read

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Safety Benchmark WMDP

1 minute read

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatt...

More FM risk

38 minute read

In this session, our readings cover:

LLM basics

less than 1 minute read

Required Readings:

Back to Top ↑

2022

RLHF + InstructGPT

less than 1 minute read

Papers Paper URL Abstract Training language models to follow instructions with human feedback URL ...

Emergent Abilities of LLM

1 minute read

Emergent Abilities of Large Language Models URL “an ability to be emergent if it is not present in smaller models but is present in larger models. Thus...

DiffDock + ESMfold

less than 1 minute read

Papers Paper URL Abstract Evolutionary-scale prediction of atomic level protein structure with a language mo...

Decision Transformers

1 minute read

Decision Transformer: Reinforcement Learning via Sequence Modeling Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Piet...

Back to Top ↑