Project - Zero-shot Minecraft Planning Agent


Zero-shot Minecraft Planning Agent: KB-Enhanced LLM-based PDDL Generation

image-20260127230605271

We tackle long-horizon planning in Minecraft by proposing KB-enhanced PDDL: retrieve prior knowledge from Wiki, construct or refine a PDDL domain, and reuse it for planning—reducing repeated LLM calls, improving success rate, and lowering token usage and execution time.


1. Background: Why long-horizon planning in Minecraft is hard

Minecraft tasks often require multi-step dependencies (e.g., crafting chains). The technology tree creates long plans with complex prerequisites, making zero-shot success difficult.

image-20260127224822739


2. LLMs in task planning: Planner vs. World Model

LLMs can be used in two roles:

  • LLM as Planner: generate plans / high-level action sequences.
  • LLM as World Model: model environment dynamics to support planning.
LLM as Planner LLM as World Model
image-20260127225710839 image-20260127225726220

3. Jarvis-1 overview: Hierarchical planning with memory

Jarvis-1 is a hierarchical approach where a memory-augmented MLLM retrieves reference trajectories and uses in-context learning to generate the next high-level action. The controller (e.g., Steve-1) maps high-level actions to real-time mouse/keyboard actions via a pretrained skill library.

image-20260127230621905

image-20260127230712811

4. Motivation: Rethinking Jarvis-1

Q1: Where does prior knowledge come from?

Jarvis-1 relies heavily on a trajectory memory database (human-crafted / oracle / agent-generated). In a strict zero-shot setting, missing dependency knowledge can cause failures—for example, trying to dig raw iron without being equipped with the required tool. :contentReference[oaicite:5]{index=5}

image-20260127230736162

Q2: Is “LLM as planner” the best choice?

Querying an LLM for every action in every episode is expensive. If we run many episodes of the same task, LLM-as-policy scales like:

  • #episodes × #actions × tokens/action

Whereas PDDL planning can reuse a domain and only pay the cost of a small number of domain improvements:

  • 1 × (#PDDL improvements) :contentReference[oaicite:6]{index=6}

image-20260127230803855


5. Method: KB-enhanced PDDL Generation

We answer the two questions above as follows:

  • Prior knowledge can be retrieved from Wiki.
  • This knowledge can be converted into a reusable PDDL domain file, so we can plan without querying the LLM at every step.
  • The resulting planned actions are executed by the controller (e.g., Steve-1).

5.1 Pipeline (high-level)

  1. Retrieve relevant crafting/dependency knowledge from Wiki (KB).
  2. Construct/Refine a PDDL domain (actions, preconditions, effects).
  3. Plan with a PDDL planner to obtain a high-level action sequence.
  4. Execute planned actions via the controller in the Minecraft environment.

image-20260127230831504


6. Experiments

6.1 Task settings

We evaluate on increasingly challenging tasks:

  • Get crafting table
  • Get wooden pickaxe
  • Get stone pickaxe
  • Get iron pickaxe (most challenging) :contentReference[oaicite:9]{index=9}

image-20260127230900244

6.2 Method settings (baselines)

We compare against:

  • Baseline: Jarvis without memory
  • Generate Domain: LLM generates domain.pddl
  • Generate Precond & Effect: LLM generates domain preconditions & effects
  • Zero-shot: Jarvis without memory and without dependency knowledge

6.3 Metrics

We report:

  • Success rate
  • Token usage (including plan token vs skill token breakdown)
  • Execution time

7. Results & Discussion

From the presentation results:

  • Higher success rate with KB-enhanced PDDL
  • Lower token usage, especially by reducing repeated planning-time LLM queries
  • Faster execution time
  • More controllable and interpretable planning, reducing hallucination or erroneous actions

image-20260127231033348

image-20260127231042817

image-20260127231059286


8. Demo: In-game execution

We include an in-game execution demonstration comparing:

  • Jarvis-1 using LLM getskills
  • Jarvis-1 using KB-enhanced PDDL
Baseline KB-enhanced PDDL

9. Advantages

Our approach:

  • Uses fewer tokens while achieving higher success rate
  • Reduces hallucination / erroneous actions
  • Speeds up execution time
  • Makes planning more controllable and interpretable

10. Limitations & Future Work

10.1 Quantity-insensitive planning

A plan can be procedurally correct but still fail due to insufficient item quantities. A possible solution is quantity-aware planning.

10.2 Text-only action decisions (no visual cues) + luck-driven mining

Current decisions rely on text-only signals; mining can be luck-driven, so success can depend on chance. A future direction is action decisions based on both text and visual information. :contentReference[oaicite:16]{index=16}


References

  • Fan, Linxi, et al. MineDojo: Building open-ended embodied agents with internet-scale knowledge. NeurIPS 2022. :contentReference[oaicite:17]{index=17}
  • Wang, Zihao, et al. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. TPAMI 2024. :contentReference[oaicite:18]{index=18}
  • Guan, Lin, et al. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. NeurIPS 2023. :contentReference[oaicite:19]{index=19}
  • Lifshitz, S., et al. Steve-1: A generative model for text-to-behavior in Minecraft. NeurIPS 2023. :contentReference[oaicite:20]{index=20}

Author: Liang Junyi
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Liang Junyi !
  TOC