<!-- Generated from TradeReady.io docs. Visit https://tradeready.io/docs for the full experience. -->

---
title: Strategy Testing
description: Multi-episode testing and the recommendation engine
---

Testing runs your strategy against multiple randomized historical periods and gives you aggregate performance statistics. It is the step between creating a strategy and deploying it live.

---

## What Testing Does

A single backtest on one time period can be misleading — the strategy might just be lucky on that particular month. Multi-episode testing solves this.

When you start a test run with N episodes, the platform:

1. Randomly selects N non-overlapping date ranges within your specified window
2. Runs a full backtest for each episode (same strategy definition, different dates)
3. Aggregates the results across all episodes
4. Runs the **Recommendation Engine** on the aggregated data
5. Saves per-episode metrics and per-pair breakdowns

The result is a statistically meaningful picture of how your strategy behaves across different market conditions.

> **Info:**
> Test runs are executed by Celery workers. Depending on episode count and date range, a run may take seconds to several minutes. Poll for completion or watch the UI.

---

## Running a Test

  **Python SDK:**

```python
from agentexchange import AgentExchangeClient
import time

client = AgentExchangeClient(api_key="ak_live_...")

# Start the test
test = client.run_test(
    strategy_id="strat_abc123",
    version=1,
    episodes=20,
    date_range={"start": "2025-01-01", "end": "2025-08-01"},
    episode_duration_days=30,
)

# Poll for completion
while True:
    status = client.get_test_status("strat_abc123", test["test_run_id"])
    print(f"Progress: {status['progress_pct']:.0f}%")
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(5)

# Read results
results = client.get_test_results("strat_abc123", test["test_run_id"])
```
  **REST API:**

```bash
# Start a test run
curl -X POST http://localhost:8000/api/v1/strategies/strat_abc123/test \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "version": 1,
    "episodes": 20,
    "date_range": {"start": "2025-01-01", "end": "2025-08-01"},
    "episode_duration_days": 30
  }'

# Poll status
curl http://localhost:8000/api/v1/strategies/strat_abc123/tests/{test_id} \
  -H "Authorization: Bearer $JWT"

# Get results
curl http://localhost:8000/api/v1/strategies/strat_abc123/test-results \
  -H "Authorization: Bearer $JWT"
```

### Test Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| `version` | required | Strategy version number to test |
| `episodes` | `10` | Number of test episodes to run |
| `date_range.start` | required | Earliest date for episode selection |
| `date_range.end` | required | Latest date for episode selection |
| `episode_duration_days` | `30` | Length of each episode in days |

---

## Monitoring Test Progress

```bash
GET /api/v1/strategies/{id}/tests/{test_id}
```

```json
{
  "test_run_id": "run_xyz789",
  "strategy_id": "strat_abc123",
  "version": 1,
  "status": "running",
  "progress_pct": 65.0,
  "episodes_completed": 13,
  "episodes_total": 20
}
```

Status values: `pending`, `running`, `completed`, `failed`, `cancelled`.

---

## Test Results

Once status is `completed`, fetch the full results:

```json
{
  "test_run_id": "run_xyz789",
  "strategy_id": "strat_abc123",
  "version": 1,
  "status": "completed",
  "results": {
    "episodes_completed": 20,
    "episodes_profitable": 14,
    "episodes_profitable_pct": 70.0,
    "avg_roi_pct": 4.2,
    "median_roi_pct": 3.8,
    "best_roi_pct": 12.1,
    "worst_roi_pct": -4.5,
    "std_roi_pct": 3.1,
    "avg_sharpe": 1.4,
    "avg_max_drawdown_pct": 6.8,
    "avg_trades_per_episode": 18,
    "total_trades": 360
  },
  "by_pair": [
    {
      "symbol": "BTCUSDT",
      "avg_roi_pct": 5.1,
      "avg_sharpe": 1.6,
      "episodes_profitable_pct": 75.0
    },
    {
      "symbol": "ETHUSDT",
      "avg_roi_pct": 3.3,
      "avg_sharpe": 1.2,
      "episodes_profitable_pct": 65.0
    }
  ],
  "recommendations": [
    "ETHUSDT underperforms BTCUSDT by 1.8% avg ROI — consider removing it",
    "TP/SL ratio is 2.7:1 — good risk/reward balance"
  ]
}
```

### Aggregate Metrics

| Metric | Description |
|--------|-------------|
| `episodes_completed` | Number of episodes that ran to completion |
| `episodes_profitable` | Episodes with positive final ROI |
| `episodes_profitable_pct` | Win rate across episodes |
| `avg_roi_pct` | Average ROI across all episodes |
| `median_roi_pct` | Median ROI — less sensitive to outliers |
| `best_roi_pct` / `worst_roi_pct` | Best and worst single-episode ROI |
| `std_roi_pct` | Standard deviation of ROI — measures consistency |
| `avg_sharpe` | Average Sharpe ratio across episodes |
| `avg_max_drawdown_pct` | Average worst drawdown per episode |
| `avg_trades_per_episode` | Average trade count per episode |
| `total_trades` | Total trades across all episodes |

### Per-Pair Breakdown

Results also include per-pair performance, so you can identify which pairs in your `pairs` list are contributing vs dragging down performance. Each entry has the same metrics as the aggregate, grouped by symbol.

---

## The Recommendation Engine

After a test run completes, the Recommendation Engine analyzes the results and generates plain-English suggestions. There are 11 rules:

| Trigger | Recommendation |
|---------|---------------|
| Pair ROI disparity > 5% | Remove the underperforming pair |
| Win rate < 50% | Tighten entry conditions or widen take-profit |
| Win rate > 75% | Relax entry conditions to capture more opportunities |
| Max drawdown > 15% | Tighten stop-loss |
| Max drawdown < 3% | Stop-loss may be too tight — consider loosening |
| Avg trades < 3 per episode | Entry conditions too restrictive — loosen them |
| Avg trades > 50 per episode | Add ADX filter to reduce overtrading |
| Sharpe < 0.5 | Reduce position size or improve entry timing |
| ADX threshold > 30 | Consider lowering to 20–25 |
| ADX threshold < 15 | Raise ADX threshold to 20+ for better trend filtering |
| TP/SL ratio < 1.5:1 | Widen take-profit or tighten stop-loss |

```python
results = client.get_test_results(strategy_id, test_run_id)
for rec in results["recommendations"]:
    print(f"  - {rec}")
```

> **Info:**
> Recommendations are advisory — you decide whether to apply them. Create a new version for each change and compare test results before committing to a direction.

---

## Comparing Versions

After testing multiple versions, compare them side by side:

  **Python SDK:**

```python
comparison = client.compare_versions(
    strategy_id="strat_abc123",
    v1=1,
    v2=2
)

print(comparison["v1"])       # aggregate metrics for version 1
print(comparison["v2"])       # aggregate metrics for version 2
print(comparison["improvements"])  # % improvement per metric
print(comparison["verdict"])  # "Version 2 outperforms on 3/4 metrics"
```
  **REST API:**

```bash
GET /api/v1/strategies/strat_abc123/compare-versions?v1=1&v2=2
```

---

## The Testing Workflow

```
1. Create strategy (version 1)
         |
         v
2. Run test (20 episodes, 6-month date range)
         |
         v
3. Check aggregate results:
   - episodes_profitable_pct > 60%? Good baseline.
   - avg_sharpe > 1.0? Acceptable risk-adjusted return.
   - avg_max_drawdown_pct < 10%? Manageable risk.
         |
         v
4. Read recommendations
         |
         v
5. Create version 2 with improvements
         |
         v
6. Run test on version 2 with same date range
         |
         v
7. Compare versions → deploy the winner
```

> **Warning:**
> Always test on the same date range when comparing versions. Different periods introduce market regime bias and make comparisons meaningless.

---

## Test Endpoint Reference

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/v1/strategies/{id}/test` | Start a test run |
| `GET` | `/api/v1/strategies/{id}/tests` | List all test runs |
| `GET` | `/api/v1/strategies/{id}/tests/{test_id}` | Get status and results for a run |
| `POST` | `/api/v1/strategies/{id}/tests/{test_id}/cancel` | Cancel a running test |
| `GET` | `/api/v1/strategies/{id}/test-results` | Latest completed test results |
| `GET` | `/api/v1/strategies/{id}/compare-versions?v1=N&v2=M` | Side-by-side version comparison |

---

## Next Steps

- [Deploying Strategies](/docs/strategies/deployment) — deploy a validated strategy to live trading
- [Gymnasium Environments](/docs/gym) — train an RL agent to optimize beyond rule-based conditions