Real training metrics and performance analysis using Binance Bitcoin (BTCUSDT) orderbook data. The DQN agent learns to minimize execution cost while managing market impact and inventory risk.
The agent improves over 200 episodes using deep Q-learning. Each episode represents one full execution of 1 BTC across ~100 trading steps (~100 minutes).
Comparison of DQN agent against classical execution strategies: TWAP, Passive, Aggressive, and Random. Lower reward (less negative) indicates better execution quality (lower cost).
The DQN agent learns that execution cost has three components:
After training, the agent develops adaptive behavior: