R4A: Spatial, Temporal, and Symbolic Reasoning for Agents

There has been significant interest in learning generalist policies for diverse locomotion and manipulation task execution in open-world environments. Recent vision-language-action (VLA) models (e.g., π0.5, OpenVLA, RT-2) excel at learning diverse tasks and handling many real-world edge cases, thanks to internet-scale pre-training. However, state-of-the-art VLA models are still limited to learning semantically simple and atomic tasks (e.g., "fold laundry") that do not involve long-term spatial or temporal reasoning and complex subtask compositions. We identify three open challenges that hinder existing agent policies from achieving higher-order, human-like behaviors.

This workshop explores the next frontier of embodied AI with the goal of designing generalist agents capable of human-level spatial, temporal, and symbolic reasoning. Current approaches can be broadly categorized into model-free and model-based. Model-free methods, such as state-of-the-art VLA models, infer actions directly from sensory inputs using pre-trained foundation models. In contrast, model-based methods rely on constructing and reasoning over intermediate representations such as 3D maps, dynamics models, or symbolic programs. A central theme that this workshop aims to explore is related to the comparison and interaction between these paradigms. We will consider fundamental questions such as: (1) How much can we achieve by further scaling existing model-free approaches?; (2) Can model-based approaches improve generalization and data efficiency of VLA and reinforcement learning methods, and what are their limitations? The consideration of model-based approaches also raises new questions for architecture design: (3) How can we efficiently tokenize 3D/4D features utilized by model-based environment representations for policy learning?; (4) Is explicit encoding of long-term, scene-level, and physically realistic dynamics feasible and competitive with implicit model-free reasoning?; (5) How can agents automatically synthesize layered and functional abstractions from continuous states and controls? Lastly, the above questions also necessitate renewed discussion about training and evaluation: (6) What are the roles of simulators and simulation-based methods (e.g., Sim2Real and Real2Sim) for learning agent policies with extended spatial, temporal, and symbolic reasoning capabilities?; (7) How can the community progress toward unified metrics and evaluation interfaces for consistent benchmarking?

Call for Papers

We invite submissions of short papers (up to 5 pages in NeurIPS format), excluding references and supplementary material. The submission should outline the results being presented, their novelty, and their relevance to the workshop questions. Example topics include but are not limited to the following areas:

The shorter submission format is preferred to encourage contributions on brand new ideas and work in progress. The submissions will be reviewed by a Program Committee consisting of experts in related fields. The review process will prioritize new work over already finalized papers. In particular, work that is presented at the main NeurIPS conference will not be accepted by this workshop. Accepted contributions will be made available on the workshop website as non-archival reports, and the authors will be invited to present their work during the poster session. More details and submission link will be made available.

Call for Talks

We invite junior researchers, who are PhD candidates or recent graduates (±2 years from PhD degree), to share their PhD research work and research vision on spatial, temporal, and symbolic reasoning for agents at our workshop as a 20-minute talk.

Applicants are invited to submit a talk proposal in the form of an extended abstract of up to 3 pages in NeurIPS format (excluding references) summarizing their PhD research on a topic of interest to the workshop.

The extended abstract is expected to contain and will be evaluated on the following aspects:

The submitted talk proposals will be reviewed by the workshop Program Committee following the same timeline as regular paper contributions. One proposal will be selected for presentation based on quality and relevance to the workshop topic. The corresponding junior researcher will share the stage with other invited speakers to present their PhD research. Any submitted talk proposal will also be considered for a poster presentation by default.

Important Dates

Chelsea Finn
Chelsea Finn
Stanford University
Xiaolong Wang
Xiaolong Wang
UC San Diego
Katerina Fragkiadaki
Katerina Fragkiadaki
Carnegie Mellon University
Yue Wang
Yue Wang
University of Southern California
Jan Peters
Jan Peters
TU Darmstadt
Jiajun Wu
Jiajun Wu
Stanford University
Fabio Ramos
Fabio Ramos
NVIDIA / University of Sydney
Junior Researcher
Junior Researcher Talk
To be announced

Morning

08:45 -- 09:00
Opening Remarks
09:00 -- 09:40
Invited Talk 1
09:40 -- 10:20
Invited Talk 2
10:20 -- 10:50
Coffee Break
10:50 -- 11:30
Invited Talk 3
11:30 -- 12:00
Debate Session
12:00 -- 13:00
Lunch Break

Afternoon

13:00 -- 13:40
Invited Talk 4
13:40 -- 14:00
Junior Researcher Talk
14:00 -- 15:00
Poster Session / Coffee Break
15:00 -- 15:40
Invited Talk 5
15:40 -- 16:20
Invited Talk 6
16:20 -- 17:00
Invited Talk 7
17:00 -- 17:15
Closing Remarks
Yulun Tian
Yulun Tian
University of Michigan
Ki Myung Brian Lee
Ki Myung Brian Lee
University of California, San Diego
Kehan Long
Kehan Long
University of California, San Diego
Yunzhu Li
Yunzhu Li
Columbia University
Nikolay Atanasov
Nikolay Atanasov
University of California, San Diego

Should you have any questions, please do not hesitate to contact the organizers: Yulun Tian or Kehan Long.