A classic Pong game built with Pygame where an AI agent learns to play using tabular Q-learning. The agent trains in real-time against a human-controlled paddle.
The right paddle is controlled by the Q-learning agent. The left paddle is player-controlled (W/S or ↑/↓).
The agent observes a 6-dimensional state space (ball position, ball velocity, and both paddle positions) and selects one of three actions — stay, move up, or move down — using an epsilon-greedy policy.
State space is discretized into bins to enable tabular Q-learning. The Q-table is updated each frame using the Bellman equation:
Q(s, a) ← Q(s, a) + α [r + γ · max Q(s', a') − Q(s, a)]
Key hyperparameters:
- Discount factor γ = 0.95
- Learning rate α = 0.001
- Epsilon decay: 1.0 → 0.01 over training
pong-rl/
├── game.py # Pygame environment (PongEnv, Paddle, Ball)
├── agent.py # DQNAgent (Q-learning) + simple rule-based opponent
├── main.py # Game loop, training/play mode toggle
├── pong_model.npy # Saved Q-table (auto-generated after training)
└── README.md
Install dependencies:
pip install pygame numpyRun the game:
python main.pyControls:
| Key | Action |
|---|---|
| W / ↑ | Move player paddle up |
| S / ↓ | Move player paddle down |
| T | Toggle training mode on/off |
| ESC | Quit |
In training mode, the agent learns live and saves the model every 100 episodes. In play mode, it uses the saved Q-table.
After ~1000 episodes of training, the agent consistently tracks the ball and returns most shots. Performance is logged to the terminal every 10 episodes.
- Replace tabular Q-learning with a deep neural network (true DQN)
- Add experience replay buffer
- Train a self-play agent (both paddles learn)
- Add reward shaping for more aggressive play
MIT