Agent - latent space model

SlideDeck: 2026-SP-S9.2-latent_space.pdf
Version: current
Notes: Understanding environments for Agents

In this session, our readings cover:

Required Readings: WORLD MODELS & ENVIRONMENT UNDERSTANDING

Core Component: Internal Representations - How Agents Model Their Environment

World models enable agents to build internal representations of their environment, predict outcomes, and simulate consequences before taking action. This bridges perception and planning.

Key Concepts: Environment modeling, state representation, predictive models, simulation-based planning, model-based reasoning

World Model Role in Agent Architecture:

Input: Receives data from Perception (Phase 3) and Memory (Phase 4)
Function: Builds internal representation of environment dynamics and causal relationships
Output: Informs Planning (Phase 7) by enabling agents to predict action consequences
Use Cases: Robotics, game playing, strategic decision-making, healthcare interventions

Topic	Slide Deck	Previous Semester
Agent - Planning / World Model	W10.1-Team 3-Planning	25course

2025 HIGH-IMPACT PAPERS on this topic

Agent Planning with World Knowledge Model

Shuofei Qiao, Runnan Fang, Ningyu Zhang, Yuqi Zhu, Xiang Chen, Shumin Deng, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
[Submitted on 23 May 2024 (v1), last revised 3 Jan 2026 (this version, v4)]
NeurIPS 2024
Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the ``real’’ physical world. Imitating humans’ mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric World Knowledge Model (WKM) to facilitate agent planning. Concretely, we steer the agent model to self-synthesize knowledge from both expert and sampled trajectories. Then we develop WKM, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning. Experimental results on three complex real-world simulated datasets with three state-of-the-art open-source LLMs, Mistral-7B, Gemma-7B, and Llama-3-8B, demonstrate that our method can achieve superior performance compared to various strong baselines. Besides, we analyze to illustrate that our WKM can effectively alleviate the blind trial-and-error and hallucinatory action issues, providing strong support for the agent’s understanding of the world. Other interesting findings include: 1) our instance-level task knowledge can generalize better to unseen tasks, 2) weak WKM can guide strong agent model planning, and 3) unified WKM training has promising potential for further development. The code is available at this https URL. Comments: NeurIPS 2024
a. AgentGym-RL: Training Agents for Long-Horizon Decision Making (September 2025)
- https://github.com/WooooDyy/LLM-Agent-Paper-List
- RL version of AgentGym for learning from interactive environments
- Interactive frontend for trajectory visualization, multi-turn RL
b. DreamerV3: Mastering Diverse Control Tasks through World Models
- Nature (April 2025) / arXiv GitHub
- A general reinforcement-learning algorithm that outperforms specialized expert algorithms across diverse tasks by learning a model of the environment and improving its behaviour by imagining future scenarios.
- Dreamer succeeds across domains ranging from robot locomotion and manipulation tasks over Atari games, procedurally generated ProcGen levels, and DMLab tasks to the complex and infinite world of Minecraft.
- First algorithm to collect diamonds in Minecraft from scratch without human data or curricula
- Uses Recurrent State-Space Model (RSSM) for latent imagination and planning
c. V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
- arXiv GitHub Meta AI
- The first world model trained on video that achieves state-of-the-art visual understanding and prediction, enabling zero-shot robot control in new environments.
- Post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset enables zero-shot deployment on Franka arms without collecting any data from those environments.
- V-JEPA 2-AC achieves reach = 100%, manipulation = 60–80% compared to Cosmos’s reach = 80%, manipulation = 0–20%, while being 15× faster (16 seconds/action vs 4 minutes).
- Predicts in representation space rather than pixel space—key innovation for efficient planning

2026 Spring UVA CS - GenAI-Overview

Agent - latent space model

Required Readings: WORLD MODELS & ENVIRONMENT UNDERSTANDING

2025 HIGH-IMPACT PAPERS on this topic

Agent Planning with World Knowledge Model

More Readings:

. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Video Understanding with Large Language Models: A Survey