Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis_scripts		analysis_scripts
spot_docs		spot_docs
spot_results_40		spot_results_40
spot_results_gt		spot_results_gt
spots_40		spots_40
.gitignore		.gitignore
README.md		README.md
agents.py		agents.py
archive_results.py		archive_results.py
dummy_llm.py		dummy_llm.py
evaluation.py		evaluation.py
evaluation.txt		evaluation.txt
game_theory_agent.py		game_theory_agent.py
game_theory_agent.txt		game_theory_agent.txt
game_theory_multiplayer_agent.py		game_theory_multiplayer_agent.py
game_theory_multiplayer_agent.txt		game_theory_multiplayer_agent.txt
heuristic.txt		heuristic.txt
llm_agent.py		llm_agent.py
ludo.svg		ludo.svg
ludo_env.py		ludo_env.py
overall.txt		overall.txt
read.txt		read.txt
real_llm.py		real_llm.py
rules.txt		rules.txt
run.txt		run.txt
run_all_spots.py		run_all_spots.py
run_all_spots_baseline.py		run_all_spots_baseline.py
run_all_spots_gt.py		run_all_spots_gt.py
run_experiments.py		run_experiments.py
run_gt_experiments.py		run_gt_experiments.py
run_spot_evaluation.py		run_spot_evaluation.py
sample_spots_40.py		sample_spots_40.py
spot_creation.txt		spot_creation.txt
spot_evaluation.py		spot_evaluation.py

Repository files navigation

Ludo + LLM Behavioral Evaluation

This repository contains the code and evaluation assets for Ludo behavioral analysis with:

LLM agents (llm_agent.py + real_llm.py)
deterministic baselines (agents.py)
game-theory search agents (game_theory_agent.py, game_theory_multiplayer_agent.py)

Core Pipelines

Full-game experiments: run_experiments.py
Spot single-case evaluation: run_spot_evaluation.py
Spot batch (LLM vs heuristic): run_all_spots.py
Spot batch (GT vs heuristic): run_all_spots_gt.py
Persona comparison: analysis_scripts/compare_personas.py
LLM-vs-GT comparison on spots: analysis_scripts/compare_llm_vs_gt.py
Archive snapshots: archive_results.py

Public Benchmark Scope

Final public benchmark subset: spots_40/
Broader source set used during construction: spots/
Temporary/deprecated sets: spots_temporary/

Typical Commands

LLM run on final benchmark subset:

python3 run_all_spots.py --spots-glob "spots_40/spots_*.json" --out-dir spot_results_40

LLM all personas on final benchmark subset:

python3 run_all_spots.py --spots-glob "spots_40/spots_*.json" --all-personas --out-dir spot_results_40

GT run on final benchmark subset:

python3 run_all_spots_gt.py --spots-glob "spots_40/spots_*.json" --out-dir spot_results_gt --depth 2

Persona aggregation:

python3 analysis_scripts/compare_personas.py --root spot_results_40 --personas aggressive,greedy,safe,unforgiving,none

LLM vs GT category comparison:

python3 analysis_scripts/compare_llm_vs_gt.py --llm-root spot_results_40/none --gt-root spot_results_gt --out-csv spot_results_gt/llm_vs_gt_comparison.csv --out-actions-json spot_results_gt/llm_vs_gt_action_transitions.json

Primary Documentation

read.txt - architecture overview
overall.txt - file responsibility map
rules.txt - implemented game rules
evaluation.txt - metric definitions
spot_creation.txt - benchmark subset and source-set construction notes
spot_docs/README.txt - active spot-doc conventions for spots_40/

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

Contributors

Languages

Python 100.0%