Gymnasium Overview
Train RL trading agents with tradeready-gym, a fully Gymnasium-compliant package
tradeready-gym is a Python package that wraps the TradeReady backtesting engine in a standard Gymnasium interface. You can plug it into any RL framework that speaks Gymnasium — Stable-Baselines3, RLlib, CleanRL, and others.
What Is It?
The package gives you training environments where your model controls a virtual trading agent. At each step, the model receives market observations and account state, chooses an action (buy, sell, hold, or a continuous position size), and receives a reward signal.
The environments are backed by the same sandbox used for manual backtesting: real Binance candle data, 0.1% trading fees, and realistic slippage. There is no look-ahead bias — the DataReplayer enforces a strict WHERE bucket <= virtual_clock constraint.
What you get:
- 7 pre-registered environments for single-asset, multi-asset, and live trading
- 5 reward functions (PnL, log return, Sharpe, Sortino, drawdown penalty)
- 3 environment wrappers (feature engineering, normalization, batch stepping)
- Automatic training progress reporting to the platform dashboard via
TrainingTracker
Installation
pip install tradeready-gym
# With RL frameworks
pip install tradeready-gym stable-baselines3 torch
Requirements: Python 3.12+, Gymnasium >= 0.29, numpy >= 1.26, httpx >= 0.28, and a running TradeReady platform instance.
Quick Start
import gymnasium as gym
import tradeready_gym # registers all environments
env = gym.make(
"TradeReady-BTC-v0",
api_key="ak_live_...",
starting_balance=10000,
timeframe="1h",
start_time="2025-01-01T00:00:00Z",
end_time="2025-03-01T00:00:00Z",
)
obs, info = env.reset()
for _ in range(1000):
action = env.action_space.sample() # replace with your model
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
The 5-Tuple Step API
Every call to env.step(action) returns:
| Value | Type | Description |
|---|---|---|
obs | np.ndarray | Market observations and account state |
reward | float | Signal from the reward function |
terminated | bool | True when the episode ends at the final candle |
truncated | bool | True if ended early (e.g. account blown out) |
info | dict | Step details: equity, filled orders, virtual time |
The info dictionary includes:
{
"equity": 10245.30,
"available_cash": 8000.00,
"position_value": 2245.30,
"unrealized_pnl": 245.30,
"virtual_time": "2025-01-15T14:00:00Z",
"step": 336,
"total_steps": 1464,
"filled_orders": [],
"progress_pct": 22.95
}
Gymnasium Compliance
tradeready-gym implements the full Gymnasium Env interface:
env.reset(seed=42) # reproducible episode starts
env.step(action) # 5-tuple return
env.action_space # gymnasium.spaces.Space subclass
env.observation_space # gymnasium.spaces.Box
env.render() # optional — not implemented (trading has no visual render)
env.close() # completes the training run and cleans up the backtest session
All action and observation spaces are properly defined as gymnasium.spaces objects, so env.action_space.sample() always returns a valid action and env.observation_space.contains(obs) always returns True.
Always call env.close() when training is finished. This marks the training run as complete in the platform dashboard and releases the backtest session.
Three Workflow Types
The gym supports three distinct workflows:
| Workflow | Use Case | Environment |
|---|---|---|
| Historical training | Train on past data, evaluate on held-out periods | TradeReady-BTC-v0, TradeReady-BTC-Continuous-v0, TradeReady-Portfolio-v0 |
| Live paper trading | Deploy a trained model to trade in real time | TradeReady-Live-v0 |
| RL training pipeline | Full cycle: train → track → compare → deploy best | Any environment with track_training=True |
Training Tracking
Set track_training=True (the default) to automatically sync training progress with the platform dashboard:
env = gym.make(
"TradeReady-BTC-v0",
api_key="ak_live_...",
track_training=True,
strategy_label="ppo_v1",
)
The tracker:
- Registers a training run on the first
reset()call - Reports episode metrics (ROI, Sharpe, drawdown, trades, cumulative reward) after each completed episode
- Marks the run as complete on
env.close()
Training runs are visible at /training in the dashboard. See Training Tracking for full details.
Next Steps
- Environments — all 7 registered environments, action spaces, observation space
- Rewards — 5 reward functions and how to write custom ones
- Training Tracking — syncing runs to the dashboard
- Examples — complete working scripts