Trained agent playing Tetris up to 10,000 placed pieces, clearing 4000 lines
Tetris-playing agent optimized via the Noisy Cross-Entropy Method (CEM), replicating Szita & Lőrincz (2006). A linear evaluation function scores board states using 22 Bertsekas & Tsitsiklis features, and CEM evolves the weight vector to maximize lines cleared. Notably, CEM allows a strong Tetris agent to converge within ~20 minutes of training whil reaching a higher performance ceiling, compared to DQN based methods.
Install the required dependencies (numpy, gymnasium, tetris-gymnasium, tqdm, pyyaml).
pip install -r requirements.txtRun CEM optimization for 200 generations, saving weight checkpoints to ./models.
python3 -m src.main --mode train --c src/config/tetris.yaml --o ./models --verbosePlay 10 games with the best learned weights and report average lines cleared.
python3 -m src.main --mode test --c src/config/tetris.yaml --o ./models --num_episodes 10 --verbose --w src/models/best_weights.npyRecord the agent playing a full game as an MP4 video.
python3 -m src.visualize --weights src/models/best_weights.npy --output ./videosCEM maintains a Gaussian distribution over the weight space and iteratively refines it. At generation
Each generation proceeds as follows:
-
Sample
$n = 100$ weight vectors$\mathbf{w}_1, \ldots, \mathbf{w}_n$ from$f_t$ -
Evaluate each
$\mathbf{w}_i$ by playing a single game, obtaining fitness$S(\mathbf{w}_i)$ = lines cleared -
Select the top
$\rho \cdot n$ samples (with$\rho = 0.1$ ), denoting their index set as$I$ - Update the distribution parameters:
