LTLCodeGen: Code Generation of Syntactically Correct Temporal Logic for Robot Task Planning

Abstract

This paper focuses on planning robot navigation tasks from natural language specifications. We develop a modular approach, where a large language model (LLM) translates the natural language instructions into a linear temporal logic (LTL) formula with propositions defined by object classes in a semantic occupancy map. The LTL formula and the semantic occupancy map are provided to a motion planning algorithm to generate a collision-free robot path that satisfies the natural language instructions. Our main contribution is LTLCodeGen, a method to translate natural language to syntactically correct LTL using code generation. We demonstrate the complete task planning method in real-world experiments involving human speech to provide navigation instructions to a mobile robot. We also thoroughly evaluate our approach in simulated and real-world experiments in comparison to end-to-end LLM task planning and state-of-the-art LLM-to-LTL translation methods.

Introduction

Understanding and executing natural language tasks is crucial for autonomous robot assistants, but translating high-level language into low-level robot actions remains a challenge. Large Language Models (LLMs) like GPT-4, LLaMA-2, and DeepSeek-R1 have shown strong abilities in language-based task reasoning and planning, playing roles such as context extractors, translators of natural language to formal logic, task schedulers, and collaborators with multi-modal models. However, despite their capabilities, LLMs often cannot robustly handle the full process of converting language instructions to executable actions without careful system design, involving components like symbol grounding, task decomposition, and execution monitoring. Recent work has explored both end-to-end LLM-driven task planning and modular approaches. This paper adopts a modular strategy, introducing LTLCodeGen, a method that first translates language instructions into linear temporal logic (LTL) formulas and then uses semantic occupancy maps for grounded motion planning.

The proposed system begins by constructing a semantic occupancy map from sensor data, categorizing the environment into object classes, free space, or unknown areas. This map serves as a bridge between natural language and the robot’s environment. Using boolean atomic propositions that signal proximity to certain classes, the system creates a label map, enabling the robot to track which conditions are satisfied during movement. The task planning problem is defined as finding a path that fulfills a natural language-specified task while minimizing motion cost and avoiding obstacles. The robot grounds task requirements into atomic propositions and paths, ensuring that the sequence of labels (a “word”) satisfies the task's logic formula. The LTLCodeGen technique generates these formulas from natural language, and a motion planner finds a collision-free path that satisfies them, demonstrating a structured and scalable alternative to purely end-to-end LLM-based task planning.

Methodology

Our task planning approach starts by building a semantic occupancy map that captures both the physical layout and the object categories in the robot’s environment. Using a ground robot equipped with RGB and range sensors, the system constructs the map by combining localization (using Direct LiDAR Odometry), semantic segmentation (via YOLO), and semantic mapping (using SSMI). The 3D semantic map is projected into a 2D representation suited for wheeled navigation, categorizing areas as free, occupied, or associated with specific object classes. Next, we introduce LTLCodeGen, which translates natural language instructions into syntactically correct Linear Temporal Logic (LTL) formulas. Instead of directly generating LTL text, the system prompts an LLM to produce Python code using predefined logical functions, ensuring robustness against syntax errors. The code is executed to generate the final LTL formula, linking objects in instructions to semantic map categories through a systematic object ID assignment process.

After generating the LTL formula, the system transforms it into a Buchi automaton, a formal model that supports reasoning about temporal tasks. The task planning process then constructs a product planning graph that combines the robot's motion space and the automaton’s state transitions. Nodes represent combinations of robot positions and automaton states, and edges ensure both physical feasibility (collision-free movement) and logical correctness (satisfying the LTL task structure). The A* search algorithm, enhanced with a consistent heuristic based on LTL satisfaction, finds an optimal path that fulfills the task requirements. When a valid path is found, a tracking controller drives the robot along it, guaranteeing that the motion both avoids obstacles and successfully carries out the natural language command.

Results

We evaluate our system in a mock indoor environment with eight object categories and four tiers of tasks, ranging from simple single-object missions to complex, context-based instructions. Across 45 scenarios, we use LTL specifications generated by LTLCodeGen or NL2LTL, testing success rate, semantic and syntactic error rates, and LLM runtime. Results show that LTLCodeGen consistently outperforms NL2LTL, achieving stronger correctness and better handling of sequential tasks, though it requires longer inference times. An ablation study reveals that removing task explanations significantly harms complex task performance, removing code comments has minor effects, and using a smaller model (GPT-4o-mini) leads to greater difficulty handling ambiguity and sequencing constraints.

We evaluate our task planning approach on the GameTraversalBenchmark (GTB), a dataset of 150 LLM-generated 2D maps with objects, stories, and task objectives. For each map, we create a 2D semantic occupancy map and an instruction combining all objectives in order. Our system uses the map and instruction to generate a path that completes all tasks, and we measure performance with Accuracy (success rate) and Mean Path Length (path efficiency). Results show our system outperforms the GPT-4o-based planner from GTB.

We evaluate LTLCodeGen and three baselines (NL2LTL, BART-FT-RAW-human, and BART-FT-RAW-synthetic) on the Drone, Cleanup, and Pick datasets, using GPT-4o. For each dataset, prompts are tuned with a small number of examples, and both human-written and synthetic instructions are tested. Results show that LTLCodeGen significantly outperforms all baselines, demonstrating strong generalization and robust LTL generation. NL2LTL performs better than fine-tuned LLMs but worse than LTLCodeGen, mainly due to LTL syntax errors and semantic mismatches that it fails to correct.

BibTeX

@misc{rabiei2025ltlcodegencodegenerationsyntactically, title={LTLCodeGen: Code Generation of Syntactically Correct Temporal Logic for Robot Task Planning}, author={Behrad Rabiei and Mahesh Kumar A. R. and Zhirui Dai and Surya L. S. R. Pilla and Qiyue Dong and Nikolay Atanasov}, year={2025}, eprint={2503.07902}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2503.07902}, }