🚀 ASCII Tanks — Teaching LLMs Physics

🎮 What is ASCII Tanks?

ASCII Tanks is a fully terminal-based, API-driven artillery game (inspired by Scorched Earth / Worms) where LLMs battle each other using purely ASCII visuals and JSON payloads!

It features destructible terrain, a physics engine for trajectory simulation, and lives entirely in your terminal with live ANSI streaming (like parrot.live). It was built specifically to test and fine-tune the spatial reasoning of Large Language Models. The models "see" the game through a compact text endpoint and respond with JSON coordinate actions containing the move, angle, and power.

For more details on the game mechanics, setup, and API, please refer to the ASCII Tanks Details.

⚔️ The Baseline: A Struggle with Space

Initially, we pitted the baseline Ministral 3B against the much larger Mistral Small model without any fine-tuning. Since text-based LLMs naturally struggle with spatial and physics-based reasoning, it was no surprise that the smaller model struggled to calculate accurate projectile trajectories or consistently hit targets compared to its larger counterpart.

📊 Results (No Fine-Tuning)

Over a 20-match series, the baseline results were:

Mistral Small: 15 Wins
Ministral 3B: 4 Wins
Draws: 1

Average Performance (Ministral 3B):

Average Damage Given per match: 19.25
Average Damage Taken per match: 44.65

For a detailed turn-by-turn breakdown, see battle_logs/unfinetuned_llm_battle.log. Complete result metrics are tracked in battle_results/unfinetuned_llm_battle_results.csv.

🎥 Unfinetuned Battle Video

🧠 Our Moat: Deep Spatial Intuition from Pure Text

Our unique innovation—and our biggest moat—is that we strictly rely on text models, using zero visual or multi-modal capabilities. The models are fed the raw ASCII representation of the game directly as a prompt. They must learn to "see" a 2D grid of characters, understand column alignments, calculate trajectories, and derive spatial relationships entirely from text tokens!

Here is an example of what the model actually "sees" and has to reason over:

=== YOUR TURN ===
You are [A] Ministral 3B
  HP: 100  |  Angle: 45  |  Power: 50  |  Fuel: 20
Enemy [B] Mistral Small
  HP: 100

=== BATTLEFIELD ===

           +

         [A]
██   ██████   ████
███ ████████ ██████     ██
████████████████████   ███
█████████████████████ █████
███████████████████████████
████████████████████████████    ██
█████████████████████████████  ████
███████████████████████████████████
████████████████████████████████████
█████████████████████████████████████    ██
█████████████████████████████████████  ████
████████████████████████████████████████████
████████████████████████████████████████████
█████████████████████████████████████████████    ██
██████████████████████████████████████████████  ████
████████████████████████████████████████████████████                 +
█████████████████████████████████████████████████████     ██
██████████████████████████████████████████████████████   ████      █ [B]   ██
████████████████████████████████████████████████████████████████████████████████
████████████████████████████████████████████████████████████████████████████████
0·········1·········2·········3·········4·········5·········6·········7·········

Respond ONLY with a JSON object: { "move": <int>, "angle": <int>, "power": <int> }

To teach the 3B model this intricate spatial reasoning without relying on expensive human-labeled data or visual inputs, we used GRPO (Group Relative Policy Optimization) paired with an offline physics simulator.

Dataset Generation: We ran generate_dataset.py to produce thousands of randomized game states (varying terrain and tank positions) and extracted the compact ASCII state alongside the hidden "game_data" used for simulation. You can explore the data we generated here: ASCII Tanks Offline Dataset on HuggingFace.
Offline Reward Function: Instead of playing the game live, the GRPO reward function (grpo_rewards.py) evaluates candidate actions by simulating the full projectile trajectory offline. The reward signal combines:
- Structural Reward: +1.0 for valid JSON output.
- Continuous Physics Reward: A gradient from -5.0 to +11.0 based on how close the simulated shot landed to the enemy tank.
Training: Using TRL and Unsloth, we trained Ministral 3B with LoRA. Because GRPO generates multiple candidate actions per state and uses relative ranking, the model slowly internalized the physics relationships directly from the simulated rewards.

The result is a tiny model that develops genuine spatial intuition entirely through reinforcement learning! You can view the full training metrics in our Weights & Biases (WandB) Logs and download the final merged model on Hugging Face.

🏆 Results After Fine-Tuning

After fine-tuning via GRPO, we ran the same head-to-head battle between the fine-tuned Ministral 3B and the baseline Mistral Small.

📊 Fine-Tuned Results

In the new 20-match series, the results improved dramatically:

Mistral Small: 10 Wins
Ministral 3B (Fine-Tuned): 10 Wins
Draws: 0

Average Performance (Ministral 3B):

Average Damage Given per match: 34.75 (+80.5% improvement)
Average Damage Taken per match: 34.45 (-22.8% reduction)

🚀 Overall Improvement Summary

Metric	Pre-Finetuning	Post-Finetuning (GRPO)	Net Improvement
Win Rate	20%	50%	+150.0%
Avg. Damage Given	19.25	34.75	+80.5%
Avg. Damage Taken	44.65	34.45	-22.8%

The 3B model learned to accurately estimate distances, calculate parabolic trajectories, and directly hit targets, matching the win rate of a model significantly larger than itself! Dive into the logs in battle_logs/finetuned_llm_battle.log and the full summary in battle_results/finetuned_llm_battle_results.csv.

🎥 Fine-Tuned Battle Video

👥 Contributors

Yogesh Singla — @yogesh1801
Simran Srivastava — @dcube0710

Built with 🎯 at the Mistral AI Hackathon — proving that small models can learn big physics.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
battle_logs		battle_logs
battle_results		battle_results
battle_videos		battle_videos
docs		docs
.gitignore		.gitignore
README.md		README.md
ascii_tanks_grpo_dataset.jsonl		ascii_tanks_grpo_dataset.jsonl
generate_dataset.py		generate_dataset.py
grpo_rewards.py		grpo_rewards.py
llm_battle.py		llm_battle.py
llm_battle_finetuned.py		llm_battle_finetuned.py
mistral_finetuning.ipynb		mistral_finetuning.ipynb
prompt.py		prompt.py
requirements.txt		requirements.txt
server.py		server.py
verify_dataset.py		verify_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 ASCII Tanks — Teaching LLMs Physics

🎮 What is ASCII Tanks?

⚔️ The Baseline: A Struggle with Space

📊 Results (No Fine-Tuning)

🎥 Unfinetuned Battle Video

🧠 Our Moat: Deep Spatial Intuition from Pure Text

🏆 Results After Fine-Tuning

📊 Fine-Tuned Results

🚀 Overall Improvement Summary

🎥 Fine-Tuned Battle Video

👥 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 ASCII Tanks — Teaching LLMs Physics

🎮 What is ASCII Tanks?

⚔️ The Baseline: A Struggle with Space

📊 Results (No Fine-Tuning)

🎥 Unfinetuned Battle Video

🧠 Our Moat: Deep Spatial Intuition from Pure Text

🏆 Results After Fine-Tuning

📊 Fine-Tuned Results

🚀 Overall Improvement Summary

🎥 Fine-Tuned Battle Video

👥 Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages