Skip to content

johnnycode8/airplane_boarding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Applying Reinforcement Learning: Creating and Training a Custom RL Environment - Beginner Tutorials

This repository contains referenced code from my YouTube (@johnnycode) tutorial series. In this video series, you'll learn how to apply reinforcement learning to an example real-world problem of optimizing airplane passenger boarding. Starting with the basics, we'll model the boarding process as a reinforcement learning problem, create the simulation environment from scratch, and training an agent with the PPO algorithm. If you find the code and tutorials helpful, please consider supporting me:

Buy Me A Coffee

0. One Minute Series Overview

End-to-End Example of Applying Reinforcement Learning - Overview

1. Modeling Real-World Problems as Reinforcement Learning Problems

Our example real-world problem is airplane passenger boarding efficiency, or more accurately, inefficiency. Before RL can be applied to optimize this problem, we need to frame the problem as a RL problem. For example, it is not immediately apparent who or what our learning agent is.

Install FlappyBird Gymnasium

2. Creating a Reinforcement Learning Simulation Environment with Python

Once our real-world problem can be expressed as a RL problem, we need a simulation environment in order to apply RL. This video creates the skeletal code for the RL environment using Python and adhering to the Gymnasium framework; a standard interface for Reinforcement Learning environments.

Creating a RL Environment

3. Coding the Action Space and Observation Space

We'll define the possible Actions that our learning agent can take and a way to represent the states (observations) of the environment.

Action and Observation Space

4. Coding the Reward Function

We'll define the Reward Function, which is a key to our agent learning. If we coded everything correctly but the agent is not learning, the Reward Function is usually the problem.

Reward Function

5. Action Masking

Depending on the state, some Actions are not performable by the agent. Action Masking is the way to tell the agent what actions are available. This wraps up creation of our custom environment.

Action Masking

6. Training a Custom Reinforcement Learning Environment with MaskablePPO

We'll train our agent on our custom RL environment using StableBaselines3's MaskablePPO algorithm. MaskablePPO is identical to the regular PPO algorithm except that it honors masked actions.

MaskablePPO

7. Vectorized Environments and TensorBoard

Speed up training by using parallel vectorized environments. We'll also observe training progress in TensorBoard.

Vectorized Environments & TensorBoard

8. Visualize the Custom Environment with Pygame

With some basic Pygame, we can visualize the RL environment with animations, which makes debugging much easier.

Video not yet available.

(back to top)

About

Apply Reinforcement Learning Series

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages