This repository contains referenced code from my YouTube (@johnnycode) tutorial series. In this video series, you'll learn how to apply reinforcement learning to an example real-world problem of optimizing airplane passenger boarding. Starting with the basics, we'll model the boarding process as a reinforcement learning problem, create the simulation environment from scratch, and training an agent with the PPO algorithm. If you find the code and tutorials helpful, please consider supporting me:
Our example real-world problem is airplane passenger boarding efficiency, or more accurately, inefficiency. Before RL can be applied to optimize this problem, we need to frame the problem as a RL problem. For example, it is not immediately apparent who or what our learning agent is.
Once our real-world problem can be expressed as a RL problem, we need a simulation environment in order to apply RL. This video creates the skeletal code for the RL environment using Python and adhering to the Gymnasium framework; a standard interface for Reinforcement Learning environments.
We'll define the possible Actions that our learning agent can take and a way to represent the states (observations) of the environment.
We'll define the Reward Function, which is a key to our agent learning. If we coded everything correctly but the agent is not learning, the Reward Function is usually the problem.
Depending on the state, some Actions are not performable by the agent. Action Masking is the way to tell the agent what actions are available. This wraps up creation of our custom environment.
We'll train our agent on our custom RL environment using StableBaselines3's MaskablePPO algorithm. MaskablePPO is identical to the regular PPO algorithm except that it honors masked actions.
Speed up training by using parallel vectorized environments. We'll also observe training progress in TensorBoard.
With some basic Pygame, we can visualize the RL environment with animations, which makes debugging much easier.
Video not yet available.







