Market-Making Simulator

Agent-based market-making simulator built on the Avellaneda-Stoikov microstructure model. Compares a naive fixed-spread maker, the closed-form Avellaneda-Stoikov benchmark maker, and a Q-learning agent — with PnL, inventory-risk, and adverse-selection (markout) diagnostics. Installable, CI-tested (8 tests).

Author: Hatef Tabbakhian (Leo) · GitHub · LinkedIn

This project simulates the core problem a single-asset market maker faces: quote a bid and an ask, earn the spread on round-trips, but manage the inventory risk that builds up when one side fills more than the other, and the adverse selection that comes from trading against informed flow. It is a market-microstructure project — the order-flow modelling and risk diagnostics are the point, not low-latency systems engineering (see Scope & limitations).

The model

The environment follows Avellaneda & Stoikov (2008), the canonical model of market making:

Mid-price is an arithmetic Brownian motion, dS = σ dW (over the short MM horizon, drift is negligible; absolute tick moves are what fill the book).
Order flow. The maker posts a bid at distance δ_b below the mid and an ask at δ_a above it. The probability a market order arrives and fills a quote in a step decays exponentially with distance:
```
λ(δ) = A · exp(−k · δ),     P(fill in dt) = 1 − exp(−λ dt)
```
Tight quotes fill often but capture little edge; wide quotes capture more but fill rarely. A (liquidity) and k (decay) are exactly the parameters a desk estimates from trade data.
Adverse selection (optional). A configurable fraction of fills are informed: right after they trade against the maker, the mid moves in the direction that hurts the maker. This is what turns the markout metric negative and is off by default so the base case reproduces the classic result.

The MarketMakingEnv exposes a Gym-style reset() / step() loop shared by the hand-coded agents and the RL agent.

Results

1. Inventory-aware quoting controls risk for the same edge

2,000 simulated sessions per strategy, evaluated on identical seeds (paired comparison). The Avellaneda-Stoikov maker earns essentially the same average PnL as the naive fixed-spread maker but with half the volatility and half the peak inventory, nearly doubling the Sharpe.

Strategy	Mean PnL	PnL σ	Sharpe	5% PnL (downside)	Mean max\|inv\|	Mean markout	Fills
Fixed spread (baseline)	57.4	12.40	4.63	37.6	8.49	+0.997	57.7
Avellaneda-Stoikov	57.1	6.51	8.77	46.2	4.10	+0.669	85.4
Q-learning (learned)	54.7	8.12	6.74	41.8	5.36	+0.732	74.4

The fixed-spread maker's PnL has a much fatter tail — that spread comes from uncontrolled directional risk, not skill.

2. Inventory trajectories: drift vs mean-reversion

The naive maker lets inventory random-walk away from zero; the A-S maker skews its quotes by inventory and pulls the position back toward flat.

3. Risk / return

Up-and-to-the-left is better. A-S dominates the baseline (same return, far less risk); the Q-learning agent trades a little return for much lower risk than the naive maker.

4. Adverse selection: edge erodes against toxic flow

Holding the quoted spread fixed and increasing the price impact of informed flow, both mean PnL and per-fill markout fall, with markout crossing zero right around the point where informed impact equals the captured spread. This is the textbook adverse-selection signature and the reason makers widen quotes when flow looks informed.

5. The Q-learning agent rediscovers inventory control

A tabular Q-learner (state = inventory × time-to-close, action = a grid of quote distances, reward = mark-to-market wealth change minus an inventory penalty) is never told the closed-form solution. After training it learns to skew: quote a tighter ask when long (keen to sell), a tighter bid when short. Grey cells are states the agent rarely visited and are masked rather than shown as noise.

Its risk/return (Sharpe 6.7, mean max|inv| 5.4) lands much closer to the Avellaneda-Stoikov benchmark (8.8, 4.1) than to the naive baseline (4.6, 8.5) — a model-free agent recovering most of the value of the analytical solution.

Design notes & decisions

Why Avellaneda-Stoikov. It is the reference model for inventory-aware market making and has a closed-form policy, which gives the RL agent a concrete benchmark to be measured against rather than a vibe check. (It is optimal under the model's continuous-time assumptions; this simulator is discrete and simplified, so I call it a benchmark rather than "optimal".)
Why a custom tabular Q-learner, not stable-baselines3. The state space (inventory × time) is small and discrete, so a lookup table is sufficient, fully interpretable (you can read the policy off the heatmap), and keeps the whole project — RL included — runnable in CI in seconds with only NumPy.
Paired evaluation. Every agent is run on the same seeds, so strategy differences are not swamped by Monte Carlo noise.
Markout as the adverse-selection metric. Per-fill PnL measured a few steps later is the standard desk diagnostic for being picked off; the informed-flow toggle makes it actually move, so the metric is demonstrated, not just defined.
A bug worth mentioning. The first inventory-skew test failed because each env.step() advances the RNG; pricing three agents on "the same" paths needs fresh, identically-seeded environments. The simulator re-seeds on construction for exactly this reason.

Project structure

Market-Making-Simulator/
├── README.md
├── pyproject.toml               # installable package (pip install -e .)
├── requirements.txt
├── LICENSE
├── .gitignore
├── .github/workflows/ci.yml     # GitHub Actions: pytest on 3.9 / 3.11 / 3.12
├── src/
│   └── market_maker/
│       ├── __init__.py
│       ├── simulation.py        # AS order-flow environment (mid BM + fill model)
│       ├── agents.py            # fixed-spread baseline + Avellaneda-Stoikov maker
│       ├── rl_agent.py          # tabular Q-learning quoting agent (the bonus)
│       ├── metrics.py           # PnL, inventory risk, adverse-selection markout
│       ├── backtest.py          # Monte Carlo backtest harness + comparison
│       ├── visualization.py     # matplotlib helpers (house style)
│       └── utils.py             # logging + Sharpe helper
├── notebooks/
│   ├── 01_simulation_and_fills.ipynb
│   ├── 02_strategy_comparison.ipynb
│   └── 03_rl_quoting_agent.ipynb
├── scripts/
│   └── generate_outputs.py      # reproduces every figure + dataset
├── tests/
│   └── test_market_maker.py     # 8 pytest checks
├── data/
│   ├── strategy_comparison.csv
│   └── adverse_selection.csv
└── images/                      # generated plots used in this README

Quickstart

git clone https://github.com/Leotaby/Market-Making-Simulator.git
cd Market-Making-Simulator

python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

pytest -q                            # 8 tests
python scripts/generate_outputs.py   # regenerate every figure + dataset (~30s)
jupyter lab notebooks/

Minimal API example

from market_maker import (MarketParams, FixedSpreadAgent,
                          AvellanedaStoikovAgent, backtest, compare)

params = MarketParams()                       # AS base case: S0=100, sigma=2, T=1
fixed   = backtest(FixedSpreadAgent(half_spread=1.0), params, n_episodes=2000)
as_maker = backtest(AvellanedaStoikovAgent(gamma=0.1), params, n_episodes=2000)
print(compare([fixed, as_maker]))             # risk/return table

# Train the reinforcement-learning quoter
from market_maker import QuotingQLearningAgent, MarketParams
agent = QuotingQLearningAgent(MarketParams(), seed=0)
rewards = agent.train(n_episodes=3500)
print(agent.policy_skew_grid())               # learned inventory-skew policy

Scope & limitations

Not an HFT/low-latency system. There is no real limit-order book with queue priority, no market-data feed, and no latency model. "Fills" come from a stochastic intensity, not from matching against resting orders. Runtimes are algorithmic cost, not latency claims.
Single asset, single level. The maker quotes one bid and one ask; no multi-level book, no cross-asset hedging.
Informed flow is a reduced-form model. Adverse selection is injected as a post-fill drift, which captures the effect (negative markout) without modelling the informed trader's decision explicitly.
Tabular RL. The Q-learner discretises inventory and time; it would not scale to a richer state (book imbalance, volatility regime) without function approximation.

Next steps

A proper event-driven limit-order book with queue position (this is the bridge toward genuine HFT/microstructure work).
Order-book-imbalance and short-term-volatility features in the RL state, with a DQN/actor-critic once the state space grows.
Multi-asset inventory with correlation-aware hedging.

How this maps to quant roles

Relevant to market-making, quant trading, and quant research interviews:

A working grasp of the inventory-vs-spread trade-off and the Avellaneda-Stoikov solution, demonstrated rather than asserted.
Adverse selection / markout — the metric desks live by — implemented and shown to respond to informed flow.
Reinforcement learning applied to a real trading objective, with an interpretable, benchmarked result instead of a black box.
Engineering basics: installable package, green CI matrix, type hints, logging, tests (including the inventory-skew and adverse-selection properties), and reproducible figures.

It is not positioned as a low-latency HFT system; the limit-order-book extension in Next steps is the path toward that.

References

M. Avellaneda and S. Stoikov, High-frequency trading in a limit order book, Quantitative Finance, 2008.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Market-Making Simulator

The model

Results

1. Inventory-aware quoting controls risk for the same edge

2. Inventory trajectories: drift vs mean-reversion

3. Risk / return

4. Adverse selection: edge erodes against toxic flow

5. The Q-learning agent rediscovers inventory control

Design notes & decisions

Project structure

Quickstart

Minimal API example

Scope & limitations

Next steps

How this maps to quant roles

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
data		data
images		images
notebooks		notebooks
scripts		scripts
src/market_maker		src/market_maker
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Market-Making Simulator

The model

Results

1. Inventory-aware quoting controls risk for the same edge

2. Inventory trajectories: drift vs mean-reversion

3. Risk / return

4. Adverse selection: edge erodes against toxic flow

5. The Q-learning agent rediscovers inventory control

Design notes & decisions

Project structure

Quickstart

Minimal API example

Scope & limitations

Next steps

How this maps to quant roles

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages