Skip to content

sharedcare/OpenRA-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenRA.Bot

OpenRA.Bot is a Python-side RL/control package for OpenRA. It uses pythonnet to load the built game assemblies, calls the engine-side PythonAPI, and exposes a Gym-style environment for random agents, rule-based agents, and a baseline PPO training loop.

This repository currently contains a usable end-to-end baseline, but it is still closer to a research scaffold than a polished RL framework. The recent work has made the PPO stack, remote-lobby control path, and action masking substantially more reliable, but observation design and trainer quality are still evolving.

What Is In Scope

  • Python environment wrapper around the in-engine API
  • Engine-side PythonAPI bridge for local game start, stepping, state extraction, and action dispatch
  • Rule-based and random agents for smoke testing
  • A custom ActorCritic + PPOAgent baseline
  • Local game, local hosted lobby, and remote lobby connection helpers

Current Layout

  • envs/openra_env.py: main Gym environment
  • utils/engine.py: loads OpenRA.Game.dll and PythonAPI through pythonnet
  • utils/obs.py: converts PythonAPI.GetState() output into Python dictionaries
  • utils/actions.py: encodes Python action dicts into RLAction
  • utils/net.py: local host / remote join / lobby helpers
  • utils/PythonAPI.cs: engine bridge source used by the Python side
  • agent/agent.py: RandomMoveAgent, RuleBasedAgent, PPOAgent
  • models/actor.py: encoders and ActorCritic
  • models/buffer.py: rollout buffer for PPO
  • scripts/example_usage.py: rule-based / random control example
  • scripts/train_rl.py: baseline PPO training entry
  • scripts/rl_smoke_test.py: quick RL smoke test
  • scripts/remote_rule_based.py: join a remote lobby and run RuleBasedAgent
  • scripts/remote_ppo.py: join a remote lobby and inspect PPOAgent actions, masks, and queue state

Architecture Overview

The current execution path is:

  1. envs/openra_env.py calls utils/engine.py to load the OpenRA assemblies.
  2. PythonAPI.StartLocalGame(...) or the lobby helpers initialize a match.
  3. PythonAPI.GetState() returns a simplified RLState.
  4. utils/obs.py converts that state into Python dicts.
  5. OpenRAEnv converts the raw dict into feature, vector, or image observations.
  6. An agent chooses either a legacy dict action list or a MultiDiscrete action.
  7. utils/actions.py and PythonAPI.SendActions(...) translate that into OpenRA orders.
  8. PythonAPI.Step() advances the simulation.

Prerequisites

  • A platform supported by your OpenRA build and pythonnet
  • Python 3.8+
  • A built OpenRA tree with OpenRA.Game.dll and OpenRA.runtimeconfig.json
  • A mod and map that can be started from code, for example ra

Engine Bridge Setup

The Python package expects the compiled PythonAPI type to be available from OpenRA.Game.dll.

Recommended workflow:

  1. Keep the bridge source in OpenRA.Bot/utils/PythonAPI.cs.
  2. Add or sync that file into the OpenRA.Game project in your OpenRA solution.
  3. Build OpenRA so that the Python side can load the resulting assemblies from bin_dir.

OpenRAApiBridge.cs is deprecated and should not be used by new code.

Python Setup

cd F:\Projects\OpenRA\OpenRA.Bot
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Running Existing Scripts

Rule-based / manual smoke test:

python scripts/example_usage.py

Baseline PPO training:

python scripts/train_rl.py

Remote rule-based control:

python scripts/remote_rule_based.py --host 127.0.0.1 --port 1234 --slot Multi0

Remote PPO action debugging:

python scripts/remote_ppo.py --host 127.0.0.1 --port 1234 --slot Multi0

Remote PPO training:

python scripts/train_rl.py --remote-host 127.0.0.1 --remote-port 1234 --remote-slot Multi0

Minimal Usage

from envs.openra_env import make_env

env = make_env(
    bin_dir="F:/Projects/OpenRA/bin",
    mod_id="ra",
    map_uid="b53e25e007666442dbf62b87eec7bfbe8160ef3f",
    ticks_per_step=10,
    observation_type="vector",
    enable_actions=["noop", "move", "attack", "produce", "build", "deploy"],
)

obs, info = env.reset()

for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        break

env.close()

Observation Modes

OpenRAEnv currently supports three observation modes:

  • feature: returns the raw Python dict built from PythonAPI.GetState()
  • vector: flattened numeric observation for MLP policies
  • image: 128 x 128 x 10 semantic map for CNN-style policies

feature

This is the most complete mode and is the best option for debugging. It includes:

  • actors
  • resources
  • production
  • producible_catalog
  • placeable_areas
  • cash
  • resources_total
  • resource_capacity
  • power
  • my_owner

Each actor now exposes two order-related fields:

  • available_orders: a filtered list intended for bot/RL logic
  • available_order_ids: the raw order ids exposed by engine traits

Important nuance: available_order_ids is closer to "what traits are present on this actor", while available_orders is the safer field to use for decision logic. For example, transformable buildings may expose a raw Move order id even when they should not be treated as currently mobile.

vector

Current vector layout:

  • Up to 100 friendly units, 6 features each
  • Up to 100 enemy units, 5 features each
  • 7 resource/power slots
  • 2 map-size slots

Important caveat: the resource/power section is currently placeholder-filled in envs/openra_env.py, so the vector observation does not yet fully use the economy state already available from PythonAPI.GetState().

image

Current image layout:

  • Shape: (128, 128, 10)
  • Channels currently used reliably:
    • friendly infantry / non-infantry
    • enemy infantry / non-infantry
  • Channels for resources, cash, and power are not fully populated yet

Action Space

The RL action space is currently:

[action_type, unit_idx, target_x, target_y, target_idx, unit_type_idx]

Supported action types depend on enable_actions, but the usual set is:

  • noop
  • move
  • attack
  • produce
  • build
  • deploy

Semantics:

  • move: uses unit_idx, target_x, target_y
  • attack: uses unit_idx, target_idx
  • produce: uses queue actor index + unit_type_idx
  • build: uses queue actor index + unit_type_idx + target cell
  • deploy: uses unit_idx

The environment also accepts legacy Python dict actions, which is what the rule-based agent uses.

Action Masks

info["action_mask"] currently includes some of the following fields:

  • action_type
  • move_mask
  • attack_mask
  • deploy_mask
  • produce_queue_mask
  • produce_unit_type_mask
  • build_mask
  • build_unit_type_mask
  • unit_idx
  • target_idx
  • target_x
  • target_y
  • unit_type

These masks are no longer only heuristic action-type hints. The current implementation mixes:

  • engine-side feasibility checks for move, attack, and deploy
  • queue-state- and placement-driven checks for produce and build
  • per-head masks consumed by PPOAgent during both sampling and training

Current behavior:

  • move_mask: only set when the actor has a feasible move in a nearby neighborhood
  • attack_mask: per-attacker / per-target feasibility matrix
  • deploy_mask: checked through engine feasibility
  • produce_queue_mask: only queues that are enabled, empty, and can actually produce something in the current catalog
  • build_mask: only queues with a completed item and a currently available placement area
  • target_x / target_y: conditioned on the selected actor or queue
    • move targets are restricted to a local neighborhood around the selected actor
    • build targets are restricted to coordinates present in placeable_areas

Remaining limitation: target_x and target_y are still masked independently rather than as a joint (x, y) cell distribution, so some invalid coordinate pairs can still be sampled.

Reward Shaping

The default reward in envs/openra_env.py is development-oriented, not combat-oriented. It currently rewards and penalizes:

  • increase in owned unit count
  • increase in owned building count
  • starting new production items
  • canceling in-progress production
  • staying below a minimum cash reserve

This is useful for bootstrapping a macro baseline, but it is not enough on its own for strong tactical play.

Connection Modes

OpenRAEnv.reset() supports three startup modes:

  • local single-player start through PythonAPI.StartLocalGame(...)
  • host-local lobby flow through env.configure_host(...)
  • remote server join through env.configure_remote(...)

See utils/net.py for the exact lobby helper flow.

For remote control, the current flow is:

  1. env.configure_remote(...)
  2. reset() joins the server
  3. the client claims a slot, acknowledges the selected map, and marks itself ready
  4. lobby/network state is pumped until the host starts the game
  5. once the world exists, normal observation / action stepping begins

Recent bridge changes were specifically made to keep network traffic progressing while still in the lobby, so remote clients can stay synchronized through the lobby-to-game transition.

Known Limitations

  • The PPO baseline is still a baseline, not a polished trainer.
  • The current training loop is effectively single-environment, even though some APIs are written as if vectorized training were supported.
  • PythonAPI.GetState() is expensive because it scans a lot of world state, especially for production and build placement data.
  • Observation building and action-mask generation are more consistent than before, but still relatively expensive.
  • target_x / target_y masking is improved but still factorized rather than fully cell-joint.
  • Several scripts still assume a local development workflow and should be treated as baseline utilities rather than final UX.

Known Hacks / TODOs

  • Deploy mask blocks building undeploy (openra_env.py): The deploy action mask excludes known building types (fact, afld, weap, etc.) from undeploying via DeployTransform. This prevents the agent from constantly undeploying the Construction Yard and canceling in-progress production. In a real game, undeploying to relocate the base is a valid strategy. TODO: Remove the building-type blocklist and let the agent learn the cost of interrupting production via reward penalties (e.g. production-cancel penalty, time-waste penalty).
  • Building queue single-item guard (PythonAPI.cs): SendActions suppresses StartProduction for building-type items when the target queue already has an item (in-progress or Done). This prevents the agent from accidentally overwriting completed items before they can be placed. Unit-type queues (infantry, vehicle) are unaffected. TODO: Consider whether this should eventually be relaxed for advanced queue management strategies.
  • Per-category produce mask (openra_env.py): The produce_unit_type_mask blocks unit types whose production queue category (e.g. "building") is already occupied. This is the soft counterpart of the C#-side guard above. TODO: Re-evaluate once the agent can reliably complete the produce→build cycle.

Practical Recommendations

  • Use feature observations first when debugging action execution.
  • For remote debugging, start with scripts/remote_rule_based.py or scripts/remote_ppo_debug.py before running long PPO training jobs.
  • Treat the current PPO stack as a baseline to iterate on, not a final trainer.
  • If you are improving sample efficiency, first optimize state extraction and observation consistency before making the policy larger.
  • If you are improving policy quality, prioritize better state encoding and stricter action masking before switching to a more complex network.

Troubleshooting

  • Local start fails: check bin_dir, mod_id, and map_uid, and confirm the OpenRA build artifacts exist.
  • Python cannot load the engine: verify OpenRA.runtimeconfig.json and the required assemblies are present in bin_dir.
  • Remote join enters the lobby but does not stay synchronized: rebuild OpenRA after syncing utils/PythonAPI.cs, because remote-lobby behavior depends on the latest bridge code.
  • Production/build actions appear invalid: inspect production, placeable_areas, available_orders, and available_order_ids from feature observations first.
  • Actor indices behave strangely: remember that unit_idx and target_idx are mapped through the latest cached unit-id lists, not raw actor IDs.
  • A transformable building appears to have Move: check available_order_ids vs available_orders. The raw field may still include transform-related move orders, while the filtered field is the one intended for control logic.

License

MIT

About

Build Intelligent BOT for OpenRA.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors