OpenRA.Bot is a Python-side RL/control package for OpenRA. It uses pythonnet to load the built game assemblies, calls the engine-side PythonAPI, and exposes a Gym-style environment for random agents, rule-based agents, and a baseline PPO training loop.
This repository currently contains a usable end-to-end baseline, but it is still closer to a research scaffold than a polished RL framework. The recent work has made the PPO stack, remote-lobby control path, and action masking substantially more reliable, but observation design and trainer quality are still evolving.
- Python environment wrapper around the in-engine API
- Engine-side
PythonAPIbridge for local game start, stepping, state extraction, and action dispatch - Rule-based and random agents for smoke testing
- A custom
ActorCritic+PPOAgentbaseline - Local game, local hosted lobby, and remote lobby connection helpers
envs/openra_env.py: main Gym environmentutils/engine.py: loadsOpenRA.Game.dllandPythonAPIthroughpythonnetutils/obs.py: convertsPythonAPI.GetState()output into Python dictionariesutils/actions.py: encodes Python action dicts intoRLActionutils/net.py: local host / remote join / lobby helpersutils/PythonAPI.cs: engine bridge source used by the Python sideagent/agent.py:RandomMoveAgent,RuleBasedAgent,PPOAgentmodels/actor.py: encoders andActorCriticmodels/buffer.py: rollout buffer for PPOscripts/example_usage.py: rule-based / random control examplescripts/train_rl.py: baseline PPO training entryscripts/rl_smoke_test.py: quick RL smoke testscripts/remote_rule_based.py: join a remote lobby and runRuleBasedAgentscripts/remote_ppo.py: join a remote lobby and inspectPPOAgentactions, masks, and queue state
The current execution path is:
envs/openra_env.pycallsutils/engine.pyto load the OpenRA assemblies.PythonAPI.StartLocalGame(...)or the lobby helpers initialize a match.PythonAPI.GetState()returns a simplifiedRLState.utils/obs.pyconverts that state into Python dicts.OpenRAEnvconverts the raw dict intofeature,vector, orimageobservations.- An agent chooses either a legacy dict action list or a
MultiDiscreteaction. utils/actions.pyandPythonAPI.SendActions(...)translate that into OpenRA orders.PythonAPI.Step()advances the simulation.
- A platform supported by your OpenRA build and
pythonnet - Python 3.8+
- A built OpenRA tree with
OpenRA.Game.dllandOpenRA.runtimeconfig.json - A mod and map that can be started from code, for example
ra
The Python package expects the compiled PythonAPI type to be available from OpenRA.Game.dll.
Recommended workflow:
- Keep the bridge source in
OpenRA.Bot/utils/PythonAPI.cs. - Add or sync that file into the
OpenRA.Gameproject in your OpenRA solution. - Build OpenRA so that the Python side can load the resulting assemblies from
bin_dir.
OpenRAApiBridge.cs is deprecated and should not be used by new code.
cd F:\Projects\OpenRA\OpenRA.Bot
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txtRule-based / manual smoke test:
python scripts/example_usage.pyBaseline PPO training:
python scripts/train_rl.pyRemote rule-based control:
python scripts/remote_rule_based.py --host 127.0.0.1 --port 1234 --slot Multi0Remote PPO action debugging:
python scripts/remote_ppo.py --host 127.0.0.1 --port 1234 --slot Multi0Remote PPO training:
python scripts/train_rl.py --remote-host 127.0.0.1 --remote-port 1234 --remote-slot Multi0from envs.openra_env import make_env
env = make_env(
bin_dir="F:/Projects/OpenRA/bin",
mod_id="ra",
map_uid="b53e25e007666442dbf62b87eec7bfbe8160ef3f",
ticks_per_step=10,
observation_type="vector",
enable_actions=["noop", "move", "attack", "produce", "build", "deploy"],
)
obs, info = env.reset()
for _ in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
env.close()OpenRAEnv currently supports three observation modes:
feature: returns the raw Python dict built fromPythonAPI.GetState()vector: flattened numeric observation for MLP policiesimage:128 x 128 x 10semantic map for CNN-style policies
This is the most complete mode and is the best option for debugging. It includes:
actorsresourcesproductionproducible_catalogplaceable_areascashresources_totalresource_capacitypowermy_owner
Each actor now exposes two order-related fields:
available_orders: a filtered list intended for bot/RL logicavailable_order_ids: the raw order ids exposed by engine traits
Important nuance: available_order_ids is closer to "what traits are present on this actor", while available_orders is the safer field to use for decision logic. For example, transformable buildings may expose a raw Move order id even when they should not be treated as currently mobile.
Current vector layout:
- Up to 100 friendly units, 6 features each
- Up to 100 enemy units, 5 features each
- 7 resource/power slots
- 2 map-size slots
Important caveat: the resource/power section is currently placeholder-filled in envs/openra_env.py, so the vector observation does not yet fully use the economy state already available from PythonAPI.GetState().
Current image layout:
- Shape:
(128, 128, 10) - Channels currently used reliably:
- friendly infantry / non-infantry
- enemy infantry / non-infantry
- Channels for resources, cash, and power are not fully populated yet
The RL action space is currently:
[action_type, unit_idx, target_x, target_y, target_idx, unit_type_idx]
Supported action types depend on enable_actions, but the usual set is:
noopmoveattackproducebuilddeploy
Semantics:
move: usesunit_idx,target_x,target_yattack: usesunit_idx,target_idxproduce: uses queue actor index +unit_type_idxbuild: uses queue actor index +unit_type_idx+ target celldeploy: usesunit_idx
The environment also accepts legacy Python dict actions, which is what the rule-based agent uses.
info["action_mask"] currently includes some of the following fields:
action_typemove_maskattack_maskdeploy_maskproduce_queue_maskproduce_unit_type_maskbuild_maskbuild_unit_type_maskunit_idxtarget_idxtarget_xtarget_yunit_type
These masks are no longer only heuristic action-type hints. The current implementation mixes:
- engine-side feasibility checks for
move,attack, anddeploy - queue-state- and placement-driven checks for
produceandbuild - per-head masks consumed by
PPOAgentduring both sampling and training
Current behavior:
move_mask: only set when the actor has a feasible move in a nearby neighborhoodattack_mask: per-attacker / per-target feasibility matrixdeploy_mask: checked through engine feasibilityproduce_queue_mask: only queues that are enabled, empty, and can actually produce something in the current catalogbuild_mask: only queues with a completed item and a currently available placement areatarget_x/target_y: conditioned on the selected actor or queue- move targets are restricted to a local neighborhood around the selected actor
- build targets are restricted to coordinates present in
placeable_areas
Remaining limitation: target_x and target_y are still masked independently rather than as a joint (x, y) cell distribution, so some invalid coordinate pairs can still be sampled.
The default reward in envs/openra_env.py is development-oriented, not combat-oriented. It currently rewards and penalizes:
- increase in owned unit count
- increase in owned building count
- starting new production items
- canceling in-progress production
- staying below a minimum cash reserve
This is useful for bootstrapping a macro baseline, but it is not enough on its own for strong tactical play.
OpenRAEnv.reset() supports three startup modes:
- local single-player start through
PythonAPI.StartLocalGame(...) - host-local lobby flow through
env.configure_host(...) - remote server join through
env.configure_remote(...)
See utils/net.py for the exact lobby helper flow.
For remote control, the current flow is:
env.configure_remote(...)reset()joins the server- the client claims a slot, acknowledges the selected map, and marks itself ready
- lobby/network state is pumped until the host starts the game
- once the world exists, normal observation / action stepping begins
Recent bridge changes were specifically made to keep network traffic progressing while still in the lobby, so remote clients can stay synchronized through the lobby-to-game transition.
- The PPO baseline is still a baseline, not a polished trainer.
- The current training loop is effectively single-environment, even though some APIs are written as if vectorized training were supported.
PythonAPI.GetState()is expensive because it scans a lot of world state, especially for production and build placement data.- Observation building and action-mask generation are more consistent than before, but still relatively expensive.
target_x/target_ymasking is improved but still factorized rather than fully cell-joint.- Several scripts still assume a local development workflow and should be treated as baseline utilities rather than final UX.
- Deploy mask blocks building undeploy (
openra_env.py): The deploy action mask excludes known building types (fact,afld,weap, etc.) from undeploying via DeployTransform. This prevents the agent from constantly undeploying the Construction Yard and canceling in-progress production. In a real game, undeploying to relocate the base is a valid strategy. TODO: Remove the building-type blocklist and let the agent learn the cost of interrupting production via reward penalties (e.g. production-cancel penalty, time-waste penalty). - Building queue single-item guard (
PythonAPI.cs):SendActionssuppressesStartProductionfor building-type items when the target queue already has an item (in-progress or Done). This prevents the agent from accidentally overwriting completed items before they can be placed. Unit-type queues (infantry, vehicle) are unaffected. TODO: Consider whether this should eventually be relaxed for advanced queue management strategies. - Per-category produce mask (
openra_env.py): Theproduce_unit_type_maskblocks unit types whose production queue category (e.g. "building") is already occupied. This is the soft counterpart of the C#-side guard above. TODO: Re-evaluate once the agent can reliably complete the produce→build cycle.
- Use
featureobservations first when debugging action execution. - For remote debugging, start with
scripts/remote_rule_based.pyorscripts/remote_ppo_debug.pybefore running long PPO training jobs. - Treat the current PPO stack as a baseline to iterate on, not a final trainer.
- If you are improving sample efficiency, first optimize state extraction and observation consistency before making the policy larger.
- If you are improving policy quality, prioritize better state encoding and stricter action masking before switching to a more complex network.
- Local start fails: check
bin_dir,mod_id, andmap_uid, and confirm the OpenRA build artifacts exist. - Python cannot load the engine: verify
OpenRA.runtimeconfig.jsonand the required assemblies are present inbin_dir. - Remote join enters the lobby but does not stay synchronized: rebuild OpenRA after syncing
utils/PythonAPI.cs, because remote-lobby behavior depends on the latest bridge code. - Production/build actions appear invalid: inspect
production,placeable_areas,available_orders, andavailable_order_idsfromfeatureobservations first. - Actor indices behave strangely: remember that
unit_idxandtarget_idxare mapped through the latest cached unit-id lists, not raw actor IDs. - A transformable building appears to have
Move: checkavailable_order_idsvsavailable_orders. The raw field may still include transform-related move orders, while the filtered field is the one intended for control logic.
MIT