Skip to content

Aximilius/World-Model

Repository files navigation

Generative World Model

University of Oklahoma C S-4273-001 Capstone Design Project
Spring 2026
Group J | Project #17
Organization: OU Mobility Intelligence Lab (MiLa)
Mentor/Client: Dr. Bin Xu
Authors:
    Product Owner: Jake Kistler
    Quality Assurance: Brendan Ford
    Sprint Masters: Jason Khuu, Trinh Nhan, Shiloh Sells


World-Model is a commaVQ powered generative driving simulator that takes a frame of camera and vehicle data and "simulates" (generates) the next frame of data as a prediction.

This world model is trained on real camera and vehicle data from MiLa's vehicles driving around the University of Oklahoma campus. The goal is to use this world model as a driving simulator to train AI agent-drivers to operate MiLa's vehicles autonomously around the OU campus.

Languages: Python

Goals

  1. Collect surrounding camera data and vehicle driving data (steering, acceleration, braking, etc.) from manual operation of MiLa's 2025 Nissan Leaf around the University of Oklahoma campus.
  2. Train a world model from camera and driving data to predict how the world and vehicle states evolve based on previous frames.
  3. Evaluate the world model across varying driving conditions.
  4. Use the world model as a simulator to train an AI driver to drive around Norman.
  5. Implement the trained AI driver into a real-world autonomous vehicle to drive the car around Norman.

Getting Started

Environment

This project requires Ubuntu 24.04 or macOS to run properly. If you are on Windows, you will need to use a Ubuntu 24.04 Windows Subsystem for Linux (WSL) environment. WSL setup instructions can be found in the next section. If you are on macOS or Linux, it should be able to be run natively, so you can skip to the 'Project Setup' section


WSL

For this project, we will need to install WSL with the Ubuntu-24.04 distribution. Microsoft's full instructions for installing WSL can be found here: https://learn.microsoft.com/en-us/windows/wsl/install. By default, WSL installs the Ubuntu distribution, so we will need to tell it to install that specific distribution and to set it as the default. To install WSL, you will need to open PowerShell as an administrator.

First, check to make sure that the Ubuntu-24.04 distribution is available:

wsl --list --online

This will output a list of distributions that are available for install. You should be able to see Ubuntu-24.04 listed here.

Once you have verified that Ubuntu-24.04 is available, run the following command to install it:

wsl --install -d Ubuntu-24.04

The -d flag sets it as the default distribution for your WSL installation. Restart your computer once it has finished installing.

Once your computer has finished restarting, open PowerShell as an administrator and start WSL:

wsl

Upon your first login, it will ask you to set a username and password for you Ubuntu user account, which you can set to anything you like. Once you have set a username and password, and once you have logged in, you command line should look something like linux_user@device_name:/mnt/c/Users/windows_user$ (replace c/Users/windows_user with whatever drive letter and file directory you ran the wsl command from). This is effectively WSL mounting your Windows user folder into you Ubuntu installation. However, you do not want to store your local project repository on the Windows filepath. From here enter this command:

cd ~

This will take you to your home directory, /home/linux_user, and your command line should look like linux_user@device_name:~$. To verify that it took you to your home directory, run:

pwd

This should output /home/linux_user. If it does not, run:

echo ~ && echo $HOME

Which will output the directory ~ points to and your home directory.

Make sure everything is up to date using:

sudo apt update && sudo apt upgrade

You will want to run this command often to keep your Ubuntu installation up to date.

Now that you are in your home directory and everything is up to date, this is where you will clone the World-Model repo, such that it is located in home/linux_user/World-Model. Move on to the 'Project Setup' section for instructions on getting the repo cloned to your WSL Ubuntu install.

Microsoft has an article for additonal resources on setting up WSL as a development environment: https://learn.microsoft.com/en-us/windows/wsl/setup/environment.


Project Setup

If this is your first time cloning this repo, use the --recurse-submodules flag to clone the submodules found in the external folder:

git clone --recurse-submodules https://github.com/Aximilius/World-Model.git

Alternatively:

git clone https://github.com/Aximilius/World-Model.git
git submodule update --init --recursive

Keeping the Project Up to Date

Once the submodules have been initialized by either git clone --recurse-submodules or git submodule update --init --recursive, you can use the git pull command with the --recurse-submodules flag:

git pull --recurse-submodules origin <branch>

If a new submodule is added to the remote, you will need to point your local repo at it:

git submodule update --init --recursive

Or, if needed, run them both:

git pull --recurse-submodules origin <branch> && git submodule update --init --recursive

Training Pipeline

The modern pipeline is action-conditioned: the GPT predicts future frames given both visual context AND real driving actions (steering, throttle, brake).

Quick Start

# Full pipeline (build dataset → fine-tune → generate demo)
python scripts/run_training_pipeline.py

# Custom configuration
python scripts/run_training_pipeline.py --epochs 10 --device cuda --skip-demo

# Re-run only fine-tuning
python scripts/run_training_pipeline.py --skip-dataset --skip-demo

For detailed documentation, see PIPELINE.md.

Pipeline Steps

  1. Build RL Dataset - Encode drives + extract actions from logs

    python scripts/build_dataset_all.py --device cuda
  2. Fine-tune GPT - Train on action-conditioned sequences

    python scripts/finetune_gpt_action.py --epochs 5 --device cuda
  3. Generate Demo - Validate with 3-panel video (context | GT | predicted)

    python scripts/demo_action.py --checkpoint outputs/finetune_action/gpt_best.pt --device cuda

Data Organization

Raw driving data and processed outputs must be placed in the correct directories for the pipeline to find them.

Directory Structure

World-Model/
├── data/
│   ├── raw/
│   │   └── drives/              ← Raw .hevc video + .rlog/.rlog.bz2 log files
│   └── processed/
│       └── (outputs, auto-created)
├── outputs/
│   ├── rl_dataset_all.npz       ← Built by build_dataset_all.py
│   ├── finetune_action/         ← Training checkpoints, created by finetune_gpt_action.py
│   └── demo_action/             ← Demo videos, created by demo_action.py
└── scripts/
    ├── run_training_pipeline.py ← Unified orchestrator
    ├── build_dataset_all.py
    ├── finetune_gpt_action.py
    └── demo_action.py

Adding Data

  1. Download segments from comma connect:

    • {id}_{route}--{segment}--fcamera.hevc (video)
    • {id}_{route}--{segment}--rlog.bz2 (driving actions/CAN data)
  2. Place both files in data/raw/drives/:

    data/raw/drives/
    ├── d34c14daa88a1e86_00000047--f24b2c0ccd--0--fcamera.hevc
    ├── d34c14daa88a1e86_00000047--f24b2c0ccd--0--rlog.bz2
    ├── d34c14daa88a1e86_0000005d--de3f98f677--0--fcamera.hevc
    └── d34c14daa88a1e86_0000005d--de3f98f677--0--rlog.bz2
    
  3. Run the pipeline:

    python scripts/run_training_pipeline.py

Important: Keep the .rlog file alongside the .hevc - the action-conditioned pipeline extracts steering, throttle, and brake state values from the log.

For detailed data guidelines, see DATASET.md.


Documentation

Pipeline (current)

  • PIPELINE.md - Current action-conditioned pipeline guide
  • DATASET.md - Data collection, organization, and guidelines

Legacy documentation (deprecated)

openpilot (for data collection/replay only)

About

University of Oklahoma CS4273 Capstone Design Project - Mobility Intelligence Lab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages