minecraft-rl-gathering
A Minecraft RL agent trained with PPO (Proximal Policy Optimization) using Stable-Baselines3.
This agent was trained to gather resources in Minecraft using a distributed training architecture across RTX 5090 (training), Jetson Orin AGX (environment), and DGX Spark (LLM reward shaping).
Training Details
| Metric | Value |
|---|---|
| Total Steps | 230,920 |
| Episodes | ~85 |
| Mean Reward | -141.5 |
| Best Reward | +50.7 |
| Reward Scheme | gathering |
| Learning Rate | 0.0003 |
Hardware
- Training: NVIDIA RTX 5090 (32GB VRAM)
- Environment: NVIDIA Jetson Orin AGX (64GB RAM)
- LLM Server: NVIDIA DGX Spark - GPT-OSS-20B (vLLM)
Architecture
- Algorithm: PPO (Proximal Policy Optimization)
- Policy: MLP with [512, 512] hidden layers
- Observation Space: 82 dimensions (position, velocity, vitals, hotbar, craftable flags)
- Action Space: 37 discrete actions (movement, mining, crafting, inventory)
Observation Space (82 dimensions)
| Component | Dimensions | Description |
|---|---|---|
| Position | 3 | x, y, z normalized |
| Velocity | 3 | vx, vy, vz |
| Orientation | 2 | yaw, pitch normalized |
| Vitals | 4 | health, food, saturation, oxygen |
| Flags | 2 | is_on_ground, is_day |
| Time | 1 | time_of_day normalized |
| Hotbar | 18 | 9 slots Γ (item_type + count) |
| Held Item | 3 | type, count, durability |
| Craftable | 8 | can_craft flags for key items |
| Block Grid | 27 | 3Γ3Γ3 nearby blocks |
| Nearby Entities | 11 | closest entity info |
Action Space (37 discrete actions)
| Category | Actions |
|---|---|
| Movement (0-7) | forward, back, left, right, jump, jump_forward, sprint_forward, forward_long |
| Looking (8-11) | look_left, look_right, look_up, look_down |
| Mining (12-16) | mine, attack, mine_forward, mine_up, jump_mine_up |
| Hotbar (17-25) | select_slot_0 through select_slot_8 |
| Items (26-28) | place_block, eat_food, use_item |
| Crafting (29-36) | craft_planks, craft_sticks, craft_crafting_table, craft_wooden_pickaxe, craft_stone_pickaxe, craft_wooden_sword, craft_furnace, craft_torch |
Usage
from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO
# Download model
hf_hub_download(
repo_id='CahlenLee/minecraft-rl-gathering',
filename='model.zip',
local_dir='./models'
)
# Load and use
model = PPO.load('./models/model.zip')
# Run inference
obs = env.reset()
action, _ = model.predict(obs, deterministic=True)
Environment Setup
This model was trained on a custom Minecraft environment using:
- Mineflayer for bot control
- Custom Gymnasium wrapper for RL interface
- LLM-based reward shaping (GPT-OSS-20B via vLLM)
- Dense rewards for resource gathering
Training Configuration
PPO(
"MlpPolicy",
env,
learning_rate=3e-4,
n_steps=256,
batch_size=256,
n_epochs=15,
gamma=0.99,
gae_lambda=0.95,
ent_coef=0.02,
clip_range=0.2,
max_grad_norm=0.5,
policy_kwargs={"net_arch": {"pi": [512, 512], "vf": [512, 512]}},
)
Distributed Architecture
ββββββββββββββββ HTTP/REST ββββββββββββββββββββββββββββ
β RTX 5090 βββββββββββββββββββββΊβ Jetson Orin AGX β
β Training β Bot Steps/Obs β 4x Minecraft Bots β
β Server β β Dashboard (port 3000) β
ββββββββ¬ββββββββ ββββββββββββββββββββββββββββ
β
β Async LLM Queries
βΌ
ββββββββββββββββ
β DGX Spark β
β vLLM Server β
β (20B model) β
ββββββββββββββββ
License
MIT
Citation
If you use this model, please cite:
@misc{minecraft_rl_gathering,
author = {Cahlen Humphreys},
title = {minecraft-rl-gathering},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/cahlen/minecraft-rl-gathering}}
}
- Downloads last month
- 19