TradeReady.io
Strategy Development

Strategy Testing

Multi-episode testing and the recommendation engine

Download .md

Testing runs your strategy against multiple randomized historical periods and gives you aggregate performance statistics. It is the step between creating a strategy and deploying it live.


What Testing Does

A single backtest on one time period can be misleading — the strategy might just be lucky on that particular month. Multi-episode testing solves this.

When you start a test run with N episodes, the platform:

  1. Randomly selects N non-overlapping date ranges within your specified window
  2. Runs a full backtest for each episode (same strategy definition, different dates)
  3. Aggregates the results across all episodes
  4. Runs the Recommendation Engine on the aggregated data
  5. Saves per-episode metrics and per-pair breakdowns

The result is a statistically meaningful picture of how your strategy behaves across different market conditions.

Test runs are executed by Celery workers. Depending on episode count and date range, a run may take seconds to several minutes. Poll for completion or watch the UI.


Running a Test

from agentexchange import AgentExchangeClient
import time

client = AgentExchangeClient(api_key="ak_live_...")

# Start the test
test = client.run_test(
    strategy_id="strat_abc123",
    version=1,
    episodes=20,
    date_range={"start": "2025-01-01", "end": "2025-08-01"},
    episode_duration_days=30,
)

# Poll for completion
while True:
    status = client.get_test_status("strat_abc123", test["test_run_id"])
    print(f"Progress: {status['progress_pct']:.0f}%")
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(5)

# Read results
results = client.get_test_results("strat_abc123", test["test_run_id"])
# Start a test run
curl -X POST http://localhost:8000/api/v1/strategies/strat_abc123/test \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "version": 1,
    "episodes": 20,
    "date_range": {"start": "2025-01-01", "end": "2025-08-01"},
    "episode_duration_days": 30
  }'

# Poll status
curl http://localhost:8000/api/v1/strategies/strat_abc123/tests/{test_id} \
  -H "Authorization: Bearer $JWT"

# Get results
curl http://localhost:8000/api/v1/strategies/strat_abc123/test-results \
  -H "Authorization: Bearer $JWT"

Test Configuration

ParameterDefaultDescription
versionrequiredStrategy version number to test
episodes10Number of test episodes to run
date_range.startrequiredEarliest date for episode selection
date_range.endrequiredLatest date for episode selection
episode_duration_days30Length of each episode in days

Monitoring Test Progress

GET /api/v1/strategies/{id}/tests/{test_id}
{
  "test_run_id": "run_xyz789",
  "strategy_id": "strat_abc123",
  "version": 1,
  "status": "running",
  "progress_pct": 65.0,
  "episodes_completed": 13,
  "episodes_total": 20
}

Status values: pending, running, completed, failed, cancelled.


Test Results

Once status is completed, fetch the full results:

{
  "test_run_id": "run_xyz789",
  "strategy_id": "strat_abc123",
  "version": 1,
  "status": "completed",
  "results": {
    "episodes_completed": 20,
    "episodes_profitable": 14,
    "episodes_profitable_pct": 70.0,
    "avg_roi_pct": 4.2,
    "median_roi_pct": 3.8,
    "best_roi_pct": 12.1,
    "worst_roi_pct": -4.5,
    "std_roi_pct": 3.1,
    "avg_sharpe": 1.4,
    "avg_max_drawdown_pct": 6.8,
    "avg_trades_per_episode": 18,
    "total_trades": 360
  },
  "by_pair": [
    {
      "symbol": "BTCUSDT",
      "avg_roi_pct": 5.1,
      "avg_sharpe": 1.6,
      "episodes_profitable_pct": 75.0
    },
    {
      "symbol": "ETHUSDT",
      "avg_roi_pct": 3.3,
      "avg_sharpe": 1.2,
      "episodes_profitable_pct": 65.0
    }
  ],
  "recommendations": [
    "ETHUSDT underperforms BTCUSDT by 1.8% avg ROI — consider removing it",
    "TP/SL ratio is 2.7:1 — good risk/reward balance"
  ]
}

Aggregate Metrics

MetricDescription
episodes_completedNumber of episodes that ran to completion
episodes_profitableEpisodes with positive final ROI
episodes_profitable_pctWin rate across episodes
avg_roi_pctAverage ROI across all episodes
median_roi_pctMedian ROI — less sensitive to outliers
best_roi_pct / worst_roi_pctBest and worst single-episode ROI
std_roi_pctStandard deviation of ROI — measures consistency
avg_sharpeAverage Sharpe ratio across episodes
avg_max_drawdown_pctAverage worst drawdown per episode
avg_trades_per_episodeAverage trade count per episode
total_tradesTotal trades across all episodes

Per-Pair Breakdown

Results also include per-pair performance, so you can identify which pairs in your pairs list are contributing vs dragging down performance. Each entry has the same metrics as the aggregate, grouped by symbol.


The Recommendation Engine

After a test run completes, the Recommendation Engine analyzes the results and generates plain-English suggestions. There are 11 rules:

TriggerRecommendation
Pair ROI disparity > 5%Remove the underperforming pair
Win rate < 50%Tighten entry conditions or widen take-profit
Win rate > 75%Relax entry conditions to capture more opportunities
Max drawdown > 15%Tighten stop-loss
Max drawdown < 3%Stop-loss may be too tight — consider loosening
Avg trades < 3 per episodeEntry conditions too restrictive — loosen them
Avg trades > 50 per episodeAdd ADX filter to reduce overtrading
Sharpe < 0.5Reduce position size or improve entry timing
ADX threshold > 30Consider lowering to 20–25
ADX threshold < 15Raise ADX threshold to 20+ for better trend filtering
TP/SL ratio < 1.5:1Widen take-profit or tighten stop-loss
results = client.get_test_results(strategy_id, test_run_id)
for rec in results["recommendations"]:
    print(f"  - {rec}")

Recommendations are advisory — you decide whether to apply them. Create a new version for each change and compare test results before committing to a direction.


Comparing Versions

After testing multiple versions, compare them side by side:

comparison = client.compare_versions(
    strategy_id="strat_abc123",
    v1=1,
    v2=2
)

print(comparison["v1"])       # aggregate metrics for version 1
print(comparison["v2"])       # aggregate metrics for version 2
print(comparison["improvements"])  # % improvement per metric
print(comparison["verdict"])  # "Version 2 outperforms on 3/4 metrics"
GET /api/v1/strategies/strat_abc123/compare-versions?v1=1&v2=2

The Testing Workflow

1. Create strategy (version 1)
         |
         v
2. Run test (20 episodes, 6-month date range)
         |
         v
3. Check aggregate results:
   - episodes_profitable_pct > 60%? Good baseline.
   - avg_sharpe > 1.0? Acceptable risk-adjusted return.
   - avg_max_drawdown_pct < 10%? Manageable risk.
         |
         v
4. Read recommendations
         |
         v
5. Create version 2 with improvements
         |
         v
6. Run test on version 2 with same date range
         |
         v
7. Compare versions → deploy the winner

Always test on the same date range when comparing versions. Different periods introduce market regime bias and make comparisons meaningless.


Test Endpoint Reference

MethodPathDescription
POST/api/v1/strategies/{id}/testStart a test run
GET/api/v1/strategies/{id}/testsList all test runs
GET/api/v1/strategies/{id}/tests/{test_id}Get status and results for a run
POST/api/v1/strategies/{id}/tests/{test_id}/cancelCancel a running test
GET/api/v1/strategies/{id}/test-resultsLatest completed test results
GET/api/v1/strategies/{id}/compare-versions?v1=N&v2=MSide-by-side version comparison

Next Steps

On this page