Agent - Planning / World Model

SlideDeck: W10.1-Team 3-Planning
Version: current
Lead team: team-3

Planning

In this session, our readings cover:

Required Readings:

NVIDIA World Foundation Models

https://www.nvidia.com/en-us/glossary/world-models/
https://blogs.nvidia.com/blog/openusd-advances-physical-ai/
https://www.nvidia.com/en-us/ai/cosmos/
https://www.nvidia.com/en-us/glossary/synthetic-data-generation/?ncid=no-ncid

AI Planning: A Primer and Survey (Preliminary Report)

Dillon Z. Chen, Pulkit Verma, Siddharth Srivastava, Michael Katz, Sylvie Thiébaux
[Submitted on 7 Dec 2024]
Automated decision-making is a fundamental topic that spans multiple sub-disciplines in AI: reinforcement learning (RL), AI planning (AP), foundation models, and operations research, among others. Despite recent efforts to ``bridge the gaps’’ between these communities, there remain many insights that have not yet transcended the boundaries. Our goal in this paper is to provide a brief and non-exhaustive primer on ideas well-known in AP, but less so in other sub-disciplines. We do so by introducing the classical AP problem and representation, and extensions that handle uncertainty and time through the Markov Decision Process formalism. Next, we survey state-of-the-art techniques and ideas for solving AP problems, focusing on their ability to exploit problem structure. Lastly, we cover subfields within AP for learning structure from unstructured inputs and learning to generalise to unseen scenarios and situations.

Reasoning with Language Model is Planning with World Model

[Submitted on 24 May 2023 (v1), last revised 23 Oct 2023 (this version, v2)]
Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu
Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are easy for humans, such as generating action plans for executing tasks in a given environment, or performing complex math, logical, and commonsense reasoning. The deficiency stems from the key fact that LLMs lack an internal world model to predict the world state (e.g., environment status, intermediate variable values) and simulate long-term outcomes of actions. This prevents LLMs from performing deliberate planning akin to human brains, which involves exploring alternative reasoning paths, anticipating future states and rewards, and iteratively refining existing reasoning steps. To overcome the limitations, we propose a new LLM reasoning framework, Reasoning via Planning (RAP). RAP repurposes the LLM as both a world model and a reasoning agent, and incorporates a principled planning algorithm (based on Monto Carlo Tree Search) for strategic exploration in the vast reasoning space. During reasoning, the LLM (as agent) incrementally builds a reasoning tree under the guidance of the LLM (as world model) and task-specific rewards, and obtains a high-reward reasoning path efficiently with a proper balance between exploration vs. exploitation. We apply RAP to a variety of challenging reasoning problems including plan generation, math reasoning, and logical inference. Empirical results on these tasks demonstrate the superiority of RAP over various strong baselines, including CoT and least-to-most prompting with self-consistency. RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting.

2025 Spring UVA CS - GenAI-Overview

Agent - Planning / World Model

Required Readings:

NVIDIA World Foundation Models

AI Planning: A Primer and Survey (Preliminary Report)

Reasoning with Language Model is Planning with World Model

More Readings:

Agent Planning with World Knowledge Model

O1 Replication Journey: A Strategic Progress Report – Part 1

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Improving Transformer World Models for Data-Efficient RL