TradeReady.io
Gymnasium / RL Training

Environments

All 7 registered Gymnasium environments — action spaces, observation spaces, and configuration

Download .md

Import tradeready_gym to register all environments before calling gym.make():

import gymnasium as gym
import tradeready_gym  # registers all 7 environments

Registered Environments

Environment IDAction SpaceAssetsMode
TradeReady-BTC-v0Discrete(3)BTCHistorical
TradeReady-ETH-v0Discrete(3)ETHHistorical
TradeReady-SOL-v0Discrete(3)SOLHistorical
TradeReady-BTC-Continuous-v0Box(-1, 1, (1,))BTCHistorical
TradeReady-ETH-Continuous-v0Box(-1, 1, (1,))ETHHistorical
TradeReady-Portfolio-v0Box(0, 1, (N,))Any N pairsHistorical
TradeReady-Live-v0Discrete(3)Any pairsLive paper trading

SingleAssetTradingEnv — Discrete

The three discrete environments (TradeReady-BTC-v0, TradeReady-ETH-v0, TradeReady-SOL-v0) use a three-action space:

ActionValueEffect
Hold0Do nothing
Buy1Buy with 10% of current equity
Sell2Close the current position
env = gym.make(
    "TradeReady-BTC-v0",
    api_key="ak_live_...",
    starting_balance=10000,
    timeframe="1h",
    lookback_window=30,
    start_time="2025-01-01T00:00:00Z",
    end_time="2025-03-01T00:00:00Z",
)

# action_space = Discrete(3)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(1)  # Buy

Configuration parameters:

ParameterDefaultDescription
api_keyrequiredYour agent's API key
starting_balance10000Starting USDT balance
timeframe"1h"Candle interval for indicators
lookback_window30Number of past candles in each observation
start_timerequiredISO timestamp for episode start
end_timerequiredISO timestamp for episode end
observation_features["ohlcv"]Which features to include in observations
reward_functionPnLReward()Reward function instance
track_trainingTrueAuto-report to training dashboard
strategy_labelNoneLabel for the training run

SingleAssetTradingEnv — Continuous

The continuous environments (TradeReady-BTC-Continuous-v0, TradeReady-ETH-Continuous-v0) use a Box action space where the value represents both direction and magnitude:

Signal rangeInterpretation
-0.05 to 0.05Dead zone — Hold
> 0.05Buy — quantity = signal * position_size_pct * equity / price
< -0.05Sell — same formula with absolute value
env = gym.make(
    "TradeReady-BTC-Continuous-v0",
    api_key="ak_live_...",
    starting_balance=10000,
)

# action_space = Box(-1.0, 1.0, shape=(1,), dtype=float32)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step([0.7])  # Buy 70% of position size

Continuous environments are preferred for PPO, SAC, and TD3 where the model can learn nuanced position sizing.


MultiAssetTradingEnv — Portfolio

TradeReady-Portfolio-v0 takes target portfolio weights as actions. The environment rebalances to match the targets on each step:

env = gym.make(
    "TradeReady-Portfolio-v0",
    api_key="ak_live_...",
    pairs=["BTCUSDT", "ETHUSDT", "SOLUSDT"],
    starting_balance=50000,
)

# action_space = Box(0.0, 1.0, shape=(3,), dtype=float32)
obs, info = env.reset()

# Allocate 50% BTC, 30% ETH, 20% SOL
obs, reward, terminated, truncated, info = env.step([0.5, 0.3, 0.2])

If the weights sum to more than 1.0, they are normalized. The remainder of equity stays as cash (USDT).


LiveTradingEnv

TradeReady-Live-v0 connects to real-time Binance prices instead of historical data. It never terminates on its own — it runs until env.close() is called.

env = gym.make(
    "TradeReady-Live-v0",
    api_key="ak_live_...",
    pairs=["BTCUSDT"],
    step_interval_sec=60,  # Wait 60 seconds between steps
)

obs, info = env.reset()
while True:
    action, _ = model.predict(obs)
    obs, reward, _, _, info = env.step(action)
    # Loop forever — never sets terminated=True

The live environment uses your agent's actual virtual balance. Unlike the historical environments, there is no isolated sandbox — trades affect your real agent account.


Observation Space

All environments share the same observation builder. Configure what your model sees via observation_features:

env = gym.make(
    "TradeReady-BTC-v0",
    api_key="ak_live_...",
    lookback_window=30,
    observation_features=[
        "ohlcv",           # Open, High, Low, Close, Volume — 5 dims per candle
        "rsi_14",          # RSI normalized to [0, 1] — 1 dim per candle
        "macd",            # MACD line, signal, histogram — 3 dims per candle
        "bollinger",       # Upper, middle, lower bands — 3 dims per candle
        "volume",          # Raw volume — 1 dim per candle
        "adx",             # Trend strength — 1 dim per candle
        "atr",             # Average True Range — 1 dim per candle
        "balance",         # Cash / starting_balance — 1 scalar
        "position",        # Position value / equity — 1 scalar
        "unrealized_pnl",  # Unrealized PnL / equity — 1 scalar
    ]
)

Feature Dimensions

FeatureDims per candleType
ohlcv5Windowed (repeated for each candle in lookback_window)
rsi_141Windowed
macd3Windowed
bollinger3Windowed
volume1Windowed
adx1Windowed
atr1Windowed
balance1Scalar (appended once at the end)
position1Scalar
unrealized_pnl1Scalar

Observation shape formula:

obs_size = (lookback_window × windowed_dims × n_assets) + scalar_dims

Example (BTC only, all features, window=30):
  = (30 × 15 × 1) + 3 = 453

The observation space is always a Box(shape=(obs_size,), dtype=float32) with range [-inf, inf]. Apply NormalizationWrapper to bring values into [-1, 1] before feeding to a neural network.


Wrappers

Three wrappers are available to enhance environments:

from tradeready_gym.wrappers import (
    FeatureEngineeringWrapper,
    NormalizationWrapper,
    BatchStepWrapper,
)

env = gym.make("TradeReady-BTC-v0", api_key="ak_live_...")

# Add SMA ratios and momentum to observations
env = FeatureEngineeringWrapper(env, periods=[5, 10, 20])

# Normalize observations to [-1, 1] using online z-score
env = NormalizationWrapper(env, clip=1.0)

# Execute 5 underlying steps per action (reduces HTTP overhead)
env = BatchStepWrapper(env, n_steps=5)
WrapperEffectWhen to use
FeatureEngineeringWrapperAdds SMA ratios and price momentum to the observationWhen you want derived features without a custom obs space
NormalizationWrapperOnline z-score normalization, clipped to [-1, 1]Always recommended for neural network training
BatchStepWrapperN underlying steps per agent action, rewards summedReduce API call overhead during training

Next Steps

On this page