Optimal Scene Graph Planning with Large Language Model Guidance

1Existential Robotics Lab, University of California San Diego
2GRASP Lab, University of Pennsylvania, Philadelphia
ICRA 2024
Abstract: Recent advances in metric, semantic, and topological mapping have equipped autonomous robots with semantic concept grounding capabilities to interpret natural language tasks. This work aims to leverage these new capabilities with an efficient task planning algorithm for hierarchical metric-semantic models. We consider a scene graph representation of the environment and utilize a large language model (LLM) to convert a natural language task into a linear temporal logic (LTL) automaton. Our main contribution is to enable optimal hierarchical LTL planning with LLM guidance over scene graphs. To achieve efficiency, we construct a hierarchical planning domain that captures the attributes and connectivity of the scene graph and the task automaton, and provide semantic guidance via an LLM heuristic function. To guarantee optimality, we design an LTL heuristic function that is provably consistent and supplements the potentially inadmissible LLM guidance in multi-heuristic planning. We demonstrate efficient planning of complex natural language tasks in scene graphs of virtualized real environments.

Method Overview

Method Overview
Planning a natural language mission, μ : “Reach the oven in the kitchen”, in a scene graph G of the Gibson environment Benevolence with object, room, and floor attributes. The terms “oven” and “kitchen” in μ belong to the object and room attributes of the scene graph, respectively. The scene graph G is described to the LLM using the connectivity of its attributes (attribute hierarchy) and the LLM is used to translate μ to LTL formula φμ and associated Automaton Mφ. We construct a hierarchical planning domain from the scene graph, and use multi-resolution multi-heuristic planning to plan the mission execution. In addition to mission translation, the LLM is used to provide heuristic guidance to accelerate the planning, while an LTL heuristic is used to guarantees optimality.

Video

Scene Graph and LTL Automaton

Scene Graph Automaton
We represent the environment as a scene graph (left) where nodes correspond to locations with associated attributes (e.g., object, room, floor) and edges represent belonging or connectivity relations. The natural language task μ is translated by the LLM to an LTL formula φμ that is represented as a finite state automaton Mφ (right).

Scene Graph Hierarchy Description

Planning Domain
Due to the complexity of scene graphs, directly describing them to LLMs can exceed their context window size and token rate limits. To address this, we extract the attributes and hierarchy of the scene graph and present it as an attribute hierarchy in YAML format. This hierarchical description allows the LLM to effectively understand and utilize the scene graph information for task planning.

LLM Task Translation

LLM Task Translation
We prompt the LLM with the hierarchical description of the scene graph and a natural language task μ (e.g., “Reach the oven in the kitchen”) to generate an LTL formula φμ that captures the task requirements. The LTL formula is then converted to a finite state automaton Mφ for use in planning.

Hierarchical Planning Domain

Planning Domain
Figure 1 Figure 2 Figure 3
We construct a hierarchical planning domain from the scene graph and the LTL automaton. Each level in the hierarchical planning domain is a graph where nodes are the product of scene graph nodes and automaton states. Edges in the hierarchical planning domain represent feasible transitions between nodes, which should satisfy three conditions: 1) the edge is a transition from one node in the scene graph to the boundary of another node; 2) the transition should respect the automaton state transitions; 3) the transition should have the minimal cost among all possible transitions between the two nodes in the scene graph. This hierarchical structure allows efficient planning by enabling the planner to operate at multiple levels of abstraction.

LTL Heuristics

To guarantee the optimality of the planning process, we propose a consistent LTL heuristic:

LTL Heuristic

where the cost function c of label transition and the function g of cost to reach the accepting state set are defined as:

LTL Heuristic 2 LTL Heuristic 3

LLM Heuristics

LLM Heuristic
We design an LLM heuristic function that leverages the LLM's understanding of the scene graph and task requirements to provide semantic guidance during planning. The LLM heuristic estimates the cost to reach the goal by considering the attributes of the scene graph nodes and their relevance to the task at hand. Given the current state (s, q) in the planning domain with the task context, we prompt the LLM to obtain a possible task solution, which can be used to estimate the cost to reach the goal from the current state. This heuristic aims to accelerate the planning process by guiding the search towards promising regions of the state space based on semantic information. It may make inaccurate estimations, so it is used in conjunction with the consistent LTL heuristic to ensure optimality.

Experimental Results

Expansion-Iterations Results1
Expansion-Iterations Results1
Expansion-Iterations Results1
Expansion-Iterations Results1
Expansion-Iterations Results1
We evaluate our proposed hierarchical LTL planning approach with LLM guidance in various virtualized environments represented as scene graphs. The results demonstrate that our method is able to efficiently plan complex natural language tasks while ensuring optimality. The combination of the LLM heuristic and the consistent LTL heuristic significantly reduces the number of state expansions and planning time compared to using either heuristic alone or no heuristics. For detailed experimental results and analysis, please refer to the paper.

BibTeX

@inproceedings{dai2024optimal,
        author={Dai, Zhirui and Asgharivaskasi, Arash and Duong, Thai and Lin, Shusen and Tzes, Maria-Elizabeth and Pappas, George and Atanasov, Nikolay},
        booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
        title={{Optimal Scene Graph Planning with Large Language Model Guidance}},
        year={2024},
        pages={14062-14069},
        keywords={Measurement;Grounding;Large language models;Semantics;Automata;Planning;Logic},
        doi={10.1109/ICRA57147.2024.10610599}}
}