Agent Brain - Reasoning

Notes: world model

Reasoning

In this session, our readings cover:

Required Readings: REASONING & COGNITION

Core Component: Advanced Reasoning Capabilities of the Agent Brain

Exploring how agents reason through complex problems, including code generation, mathematical reasoning, and domain-specific reasoning.

Key Concepts: Chain-of-thought reasoning, code generation, mathematical reasoning, self-examination, test-time compute scaling

Topic	Slide Deck	Previous Semester
Advanced LLM - Code Reasoning	W4.1-Gen AI-code	25course
Advanced LLM - Math Reasoning	W4.2-LLM-Math-Reasoning	25course
Inference Test Time Scaling Law	Week14.1-T5-Test-Time-Scaling	25course
Self-exam LLM and Reasoning	W12-team-2-self-exam-LLM	24course

2025 HIGH-IMPACT PAPERS on this topic

a. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (January 2025)
- Authors: DeepSeek-AI (198 authors)
- Venue: Nature (September 2025) + arXiv
- arXiv: https://arxiv.org/abs/2501.12948
- Nature: https://www.nature.com/articles/s41586-025-09422-z
- HuggingFace: https://huggingface.co/papers/2501.12948
- GitHub: https://github.com/deepseek-ai/DeepSeek-R1
- Pure RL approach - Shows reasoning emerges without supervised demonstrations
- Remarkable results: AIME 2024 accuracy jumped from 15.6% → 71.0% (pass@1) → 86.7% (majority voting), matching OpenAI o1
- Emergent behaviors: Self-reflection, verification, strategy adaptation, “aha moments”
- Open source: Released models from 1.5B to 671B parameters
- Industry impact: Triggered the “reasoning model” race across all major labs
- Key Innovation: Demonstrates that advanced reasoning patterns emerge naturally through GRPO (Group Relative Policy Optimization) without human-labeled trajectories. The paper shows thinking time scales with performance - agents learn to “think longer” for harder problems.
b. Reasoning Language Models: A Blueprint (January 2025)
- https://arxiv.org/abs/2501.11223
- Reinforcement learning approaches for reasoning
- Connects to DeepSeek-R1, Kimi k1.5, and other reasoning models
- Comprehensive taxonomy of RLVR (Reinforcement Learning with Verifiable Rewards)
- Discusses emergent reasoning patterns and distillation to smaller models
c. Kimi k1.5: Scaling Reinforcement Learning with LLMs (January 2025)
- Link: https://arxiv.org/abs/2501.12599
Contribution: Alternative approach to scaling reasoning via RL
- Complements DeepSeek-R1 with different architectural choices
- Emphasizes scaling strategies for RL training
- Addresses computational efficiency in large-scale RL

2026 Spring UVA CS - GenAI-Overview

Agent Brain - Reasoning

Required Readings: REASONING & COGNITION

2025 HIGH-IMPACT PAPERS on this topic

More Readings: