University of Oklahoma C S-4273-001 Capstone Design Project
Spring 2026
Group J | Project #17
Organization: OU Mobility Intelligence Lab (MiLa)
Mentor/Client: Dr. Bin Xu
Authors:
Product Owner: Jake Kistler
Quality Assurance: Brendan Ford
Sprint Masters: Jason Khuu, Trinh Nhan, Shiloh Sells
World-Model is a commaVQ powered generative driving simulator that takes a frame of camera and vehicle data and "simulates" (generates) the next frame of data as a prediction.
This world model is trained on real camera and vehicle data from MiLa's vehicles driving around the University of Oklahoma campus. The goal is to use this world model as a driving simulator to train AI agent-drivers to operate MiLa's vehicles autonomously around the OU campus.
Languages: Python
- Collect surrounding camera data and vehicle driving data (steering, acceleration, braking, etc.) from manual operation of MiLa's 2025 Nissan Leaf around the University of Oklahoma campus.
- Train a world model from camera and driving data to predict how the world and vehicle states evolve based on previous frames.
- Evaluate the world model across varying driving conditions.
- Use the world model as a simulator to train an AI driver to drive around Norman.
- Implement the trained AI driver into a real-world autonomous vehicle to drive the car around Norman.
This project requires Ubuntu 24.04 or macOS to run properly. If you are on Windows, you will need to use a Ubuntu 24.04 Windows Subsystem for Linux (WSL) environment. WSL setup instructions can be found in the next section. If you are on macOS or Linux, it should be able to be run natively, so you can skip to the 'Project Setup' section
For this project, we will need to install WSL with the Ubuntu-24.04 distribution. Microsoft's full instructions for installing WSL can be found here: https://learn.microsoft.com/en-us/windows/wsl/install. By default, WSL installs the Ubuntu distribution, so we will need to tell it to install that specific distribution and to set it as the default. To install WSL, you will need to open PowerShell as an administrator.
First, check to make sure that the Ubuntu-24.04 distribution is available:
wsl --list --onlineThis will output a list of distributions that are available for install. You should be able to see Ubuntu-24.04 listed here.
Once you have verified that Ubuntu-24.04 is available, run the following command to install it:
wsl --install -d Ubuntu-24.04The -d flag sets it as the default distribution for your WSL installation. Restart your computer once it has finished installing.
Once your computer has finished restarting, open PowerShell as an administrator and start WSL:
wslUpon your first login, it will ask you to set a username and password for you Ubuntu user account, which you can set to anything you like. Once you have set a username and password, and once you have logged in, you command line should look something like linux_user@device_name:/mnt/c/Users/windows_user$ (replace c/Users/windows_user with whatever drive letter and file directory you ran the wsl command from). This is effectively WSL mounting your Windows user folder into you Ubuntu installation. However, you do not want to store your local project repository on the Windows filepath. From here enter this command:
cd ~This will take you to your home directory, /home/linux_user, and your command line should look like linux_user@device_name:~$. To verify that it took you to your home directory, run:
pwdThis should output /home/linux_user. If it does not, run:
echo ~ && echo $HOMEWhich will output the directory ~ points to and your home directory.
Make sure everything is up to date using:
sudo apt update && sudo apt upgradeYou will want to run this command often to keep your Ubuntu installation up to date.
Now that you are in your home directory and everything is up to date, this is where you will clone the World-Model repo, such that it is located in home/linux_user/World-Model. Move on to the 'Project Setup' section for instructions on getting the repo cloned to your WSL Ubuntu install.
Microsoft has an article for additonal resources on setting up WSL as a development environment: https://learn.microsoft.com/en-us/windows/wsl/setup/environment.
If this is your first time cloning this repo, use the --recurse-submodules flag to clone the submodules found in the external folder:
git clone --recurse-submodules https://github.com/Aximilius/World-Model.gitAlternatively:
git clone https://github.com/Aximilius/World-Model.gitgit submodule update --init --recursiveOnce the submodules have been initialized by either git clone --recurse-submodules or git submodule update --init --recursive, you can use the git pull command with the --recurse-submodules flag:
git pull --recurse-submodules origin <branch>If a new submodule is added to the remote, you will need to point your local repo at it:
git submodule update --init --recursiveOr, if needed, run them both:
git pull --recurse-submodules origin <branch> && git submodule update --init --recursiveThe modern pipeline is action-conditioned: the GPT predicts future frames given both visual context AND real driving actions (steering, throttle, brake).
# Full pipeline (build dataset → fine-tune → generate demo)
python scripts/run_training_pipeline.py
# Custom configuration
python scripts/run_training_pipeline.py --epochs 10 --device cuda --skip-demo
# Re-run only fine-tuning
python scripts/run_training_pipeline.py --skip-dataset --skip-demoFor detailed documentation, see PIPELINE.md.
-
Build RL Dataset - Encode drives + extract actions from logs
python scripts/build_dataset_all.py --device cuda
-
Fine-tune GPT - Train on action-conditioned sequences
python scripts/finetune_gpt_action.py --epochs 5 --device cuda
-
Generate Demo - Validate with 3-panel video (context | GT | predicted)
python scripts/demo_action.py --checkpoint outputs/finetune_action/gpt_best.pt --device cuda
Raw driving data and processed outputs must be placed in the correct directories for the pipeline to find them.
World-Model/
├── data/
│ ├── raw/
│ │ └── drives/ ← Raw .hevc video + .rlog/.rlog.bz2 log files
│ └── processed/
│ └── (outputs, auto-created)
├── outputs/
│ ├── rl_dataset_all.npz ← Built by build_dataset_all.py
│ ├── finetune_action/ ← Training checkpoints, created by finetune_gpt_action.py
│ └── demo_action/ ← Demo videos, created by demo_action.py
└── scripts/
├── run_training_pipeline.py ← Unified orchestrator
├── build_dataset_all.py
├── finetune_gpt_action.py
└── demo_action.py
-
Download segments from comma connect:
{id}_{route}--{segment}--fcamera.hevc(video){id}_{route}--{segment}--rlog.bz2(driving actions/CAN data)
-
Place both files in
data/raw/drives/:data/raw/drives/ ├── d34c14daa88a1e86_00000047--f24b2c0ccd--0--fcamera.hevc ├── d34c14daa88a1e86_00000047--f24b2c0ccd--0--rlog.bz2 ├── d34c14daa88a1e86_0000005d--de3f98f677--0--fcamera.hevc └── d34c14daa88a1e86_0000005d--de3f98f677--0--rlog.bz2 -
Run the pipeline:
python scripts/run_training_pipeline.py
Important: Keep the .rlog file alongside the .hevc - the action-conditioned pipeline extracts steering, throttle, and brake state values from the log.
For detailed data guidelines, see DATASET.md.
PIPELINE.md- Current action-conditioned pipeline guideDATASET.md- Data collection, organization, and guidelines
docs/legacy/VISION_PIPELINE.md- Vision-only pipeline (reference only)docs/legacy/DEMO.md- Write-up on demo mapping and functionsdocs/legacy/NOTES.md- Jake's very own world model diarydocs/legacy/PROGRESS.md- March 30th, 2026 progress report
docs/openpilot/OPENPILOT_SETUP.md- Setupdocs/openpilot/OPENPILOT_REPLAY.md- Replay tooling