Bitcoin Optimal Execution Results

Real training metrics and performance analysis using Binance Bitcoin (BTCUSDT) orderbook data. The DQN agent learns to minimize execution cost while managing market impact and inventory risk.

Training Progress

The agent improves over 200 episodes using deep Q-learning. Each episode represents one full execution of 1 BTC across ~100 trading steps (~100 minutes).

Training curve showing reward progression

Baseline Strategy Comparison

Comparison of DQN agent against classical execution strategies: TWAP, Passive, Aggressive, and Random. Lower reward (less negative) indicates better execution quality (lower cost).

Bar chart comparing strategy performance

DQN Agent

-2402.42

Mean Execution Cost (USD)

TWAP Baseline

-2400.0

Mean Execution Cost (USD)

Improvement

-0.1%

vs Passive Strategy

Key Insights

Market Impact Understanding

The DQN agent learns that execution cost has three components:

Liquidity cost: Spreads and available orderbook depth at execution levels
Temporary impact: Price movement caused by our execution order
Permanent impact: Adverse mid-price drift after execution

Learned Execution Strategy

After training, the agent develops adaptive behavior:

Early execution when bid/ask imbalance favors sellers
Inventory-constrained acceleration near the horizon
Deep orderbook awareness via 5-level volume features

Limitations & Next Steps

Synthetic orderbook: Current implementation uses realistic but not live data
Real market integration: Connect to live Binance WebSocket for online learning
Larger scale: Test on multi-BTC notional with institutional-sized execution
Risk constraints: Add adverse fill scenarios and volatility spikes

Technical Stack

Framework

Stable-Baselines3

Algorithm

Deep Q-Network (DQN)

Environment

Gymnasium

Data Source

Binance API

Asset

Bitcoin (BTCUSDT)

Horizon

~100 minutes

Related Resources

Back to Project Overview →