Zero-shot Minecraft Planning Agent: KB-Enhanced LLM-based PDDL Generation

We tackle long-horizon planning in Minecraft by proposing KB-enhanced PDDL: retrieve prior knowledge from Wiki, construct or refine a PDDL domain, and reuse it for planning—reducing repeated LLM calls, improving success rate, and lowering token usage and execution time.

1. Background: Why long-horizon planning in Minecraft is hard

Minecraft tasks often require multi-step dependencies (e.g., crafting chains). The technology tree creates long plans with complex prerequisites, making zero-shot success difficult.

2. LLMs in task planning: Planner vs. World Model

LLMs can be used in two roles:

LLM as Planner: generate plans / high-level action sequences.
LLM as World Model: model environment dynamics to support planning.

LLM as Planner	LLM as World Model

3. Jarvis-1 overview: Hierarchical planning with memory

Jarvis-1 is a hierarchical approach where a memory-augmented MLLM retrieves reference trajectories and uses in-context learning to generate the next high-level action. The controller (e.g., Steve-1) maps high-level actions to real-time mouse/keyboard actions via a pretrained skill library.

4. Motivation: Rethinking Jarvis-1

Q1: Where does prior knowledge come from?

Jarvis-1 relies heavily on a trajectory memory database (human-crafted / oracle / agent-generated). In a strict zero-shot setting, missing dependency knowledge can cause failures—for example, trying to dig raw iron without being equipped with the required tool. :contentReference[oaicite:5]{index=5}

Q2: Is “LLM as planner” the best choice?

Querying an LLM for every action in every episode is expensive. If we run many episodes of the same task, LLM-as-policy scales like:

#episodes × #actions × tokens/action

Whereas PDDL planning can reuse a domain and only pay the cost of a small number of domain improvements:

1 × (#PDDL improvements) :contentReference[oaicite:6]{index=6}

5. Method: KB-enhanced PDDL Generation

We answer the two questions above as follows:

Prior knowledge can be retrieved from Wiki.
This knowledge can be converted into a reusable PDDL domain file, so we can plan without querying the LLM at every step.
The resulting planned actions are executed by the controller (e.g., Steve-1).

5.1 Pipeline (high-level)

Retrieve relevant crafting/dependency knowledge from Wiki (KB).
Construct/Refine a PDDL domain (actions, preconditions, effects).
Plan with a PDDL planner to obtain a high-level action sequence.
Execute planned actions via the controller in the Minecraft environment.

6. Experiments

6.1 Task settings

We evaluate on increasingly challenging tasks:

Get crafting table
Get wooden pickaxe
Get stone pickaxe
Get iron pickaxe (most challenging) :contentReference[oaicite:9]{index=9}

6.2 Method settings (baselines)

We compare against:

Baseline: Jarvis without memory
Generate Domain: LLM generates domain.pddl
Generate Precond & Effect: LLM generates domain preconditions & effects
Zero-shot: Jarvis without memory and without dependency knowledge

6.3 Metrics

We report:

Success rate
Token usage (including plan token vs skill token breakdown)
Execution time

7. Results & Discussion

From the presentation results:

Higher success rate with KB-enhanced PDDL
Lower token usage, especially by reducing repeated planning-time LLM queries
Faster execution time
More controllable and interpretable planning, reducing hallucination or erroneous actions

8. Demo: In-game execution

We include an in-game execution demonstration comparing:

Jarvis-1 using LLM getskills
Jarvis-1 using KB-enhanced PDDL

Baseline	KB-enhanced PDDL

9. Advantages

Our approach:

Uses fewer tokens while achieving higher success rate
Reduces hallucination / erroneous actions
Speeds up execution time
Makes planning more controllable and interpretable

10. Limitations & Future Work

10.1 Quantity-insensitive planning

A plan can be procedurally correct but still fail due to insufficient item quantities. A possible solution is quantity-aware planning.

10.2 Text-only action decisions (no visual cues) + luck-driven mining

Current decisions rely on text-only signals; mining can be luck-driven, so success can depend on chance. A future direction is action decisions based on both text and visual information. :contentReference[oaicite:16]{index=16}

References

Fan, Linxi, et al. MineDojo: Building open-ended embodied agents with internet-scale knowledge. NeurIPS 2022. :contentReference[oaicite:17]{index=17}
Wang, Zihao, et al. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. TPAMI 2024. :contentReference[oaicite:18]{index=18}
Guan, Lin, et al. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. NeurIPS 2023. :contentReference[oaicite:19]{index=19}
Lifshitz, S., et al. Steve-1: A generative model for text-to-behavior in Minecraft. NeurIPS 2023. :contentReference[oaicite:20]{index=20}

Liang Junyi

https://liangjunyi010.github.io/liangjunyi.github.io/2025/04/18/project-zero-shot-minecraft-planning-agent/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Liang Junyi !

LLM Agent Zero-shot

Paper - Localizing Malicious Outputs from CodeLLM

FreqRank localizes malicious outputs and backdoor triggers in Code LLMs via mutation testing.

2025-10-18 AI security

LLM Backdoor

AI Backdoor - Papers

2025-01-03 AI

Project - Zero-shot Minecraft Planning Agent

Zero-shot Minecraft Planning Agent: KB-Enhanced LLM-based PDDL Generation

1. Background: Why long-horizon planning in Minecraft is hard

2. LLMs in task planning: Planner vs. World Model

3. Jarvis-1 overview: Hierarchical planning with memory

4. Motivation: Rethinking Jarvis-1

Q1: Where does prior knowledge come from?

Q2: Is “LLM as planner” the best choice?

5. Method: KB-enhanced PDDL Generation

5.1 Pipeline (high-level)

6. Experiments

6.1 Task settings

6.2 Method settings (baselines)

6.3 Metrics

7. Results & Discussion

8. Demo: In-game execution

9. Advantages

10. Limitations & Future Work

10.1 Quantity-insensitive planning

10.2 Text-only action decisions (no visual cues) + luck-driven mining

References