Posts by Tag

Adversarial

Safety Benchmark WMDP

1 minute read

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatt...

More FM risk

38 minute read

In this session, our readings cover:

Back to Top ↑

Safety

Agent Safety

9 minute read

In this session, our readings cover:

Safety Benchmark WMDP

1 minute read

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatt...

More FM risk

38 minute read

In this session, our readings cover:

Back to Top ↑

Efficiency

Privacy

1 minute read

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Back to Top ↑

LLMEvaluate

Privacy

1 minute read

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Safety Benchmark WMDP

1 minute read

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatt...

Back to Top ↑

BasicLLM

LLM basics

less than 1 minute read

Required Readings:

Back to Top ↑

Agent

Agent Safety

9 minute read

In this session, our readings cover:

Back to Top ↑

Mitigate

Safety Benchmark WMDP

1 minute read

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatt...

Back to Top ↑

Applications

Back to Top ↑

RL

RLHF + InstructGPT

less than 1 minute read

Papers Paper URL Abstract Training language models to follow instructions with human feedback URL ...

Decision Transformers

1 minute read

Decision Transformer: Reinforcement Learning via Sequence Modeling Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Piet...

Back to Top ↑

AGI

RLHF + InstructGPT

less than 1 minute read

Papers Paper URL Abstract Training language models to follow instructions with human feedback URL ...

Decision Transformers

1 minute read

Decision Transformer: Reinforcement Learning via Sequence Modeling Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Piet...

Back to Top ↑

language model

RLHF + InstructGPT

less than 1 minute read

Papers Paper URL Abstract Training language models to follow instructions with human feedback URL ...

Emergent Abilities of LLM

1 minute read

Emergent Abilities of Large Language Models URL “an ability to be emergent if it is not present in smaller models but is present in larger models. Thus...

DiffDock + ESMfold

less than 1 minute read

Papers Paper URL Abstract Evolutionary-scale prediction of atomic level protein structure with a language mo...

Back to Top ↑

RAG

Back to Top ↑

Reasoning

Back to Top ↑

Train

Privacy

1 minute read

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Back to Top ↑

Customization

Back to Top ↑

Alignment

Back to Top ↑

Prompting

Back to Top ↑

Jailbreaking

Back to Top ↑

Serving

Back to Top ↑

Scaling

Back to Top ↑

Protein

DiffDock + ESMfold

less than 1 minute read

Papers Paper URL Abstract Evolutionary-scale prediction of atomic level protein structure with a language mo...

Back to Top ↑

Diffusion

Back to Top ↑

Image synthesis

Back to Top ↑

Human Alignment

RLHF + InstructGPT

less than 1 minute read

Papers Paper URL Abstract Training language models to follow instructions with human feedback URL ...

Back to Top ↑

Bias

Back to Top ↑

Hallucination

Back to Top ↑

DomainAdapt

Back to Top ↑

ModelEdit

Back to Top ↑

Interpretibility

Back to Top ↑

LongContext

Back to Top ↑

Planning

Back to Top ↑

Multiagent

Back to Top ↑

Multimodal

Back to Top ↑