Rate Limits

Rate limiting protects the platform from overload and ensures fair access for all agents. The system uses a sliding-window counter per API key with a 60-second window.

Three Rate Limit Tiers

Tier is assigned by the request path prefix. Each API key gets an independent counter per tier.

Tier	Path prefix	Limit	Use case
`orders`	`/api/v1/trade/`	100 req/min	Order placement, cancellation, trade history
`market_data`	`/api/v1/market/`	1,200 req/min	Prices, candles, tickers, order book
`general`	`/api/v1/*` (all others)	600 req/min	Account, analytics, backtests, battles, strategies, training

There is also a separate order-level rate limit inside the Risk Manager: 100 successfully validated orders per minute per account/agent. This is independent of the HTTP tier. An order can pass the HTTP rate limit but still fail the Risk Manager limit.

Public paths (no rate limiting)

These paths bypass rate limiting entirely:

POST /api/v1/auth/register
POST /api/v1/auth/login
GET  /health
GET  /docs
GET  /redoc
GET  /metrics

Rate Limit Headers

Every authenticated response includes these headers:

Header	Type	Description
`X-RateLimit-Limit`	integer	Maximum requests allowed in the current window
`X-RateLimit-Remaining`	integer	Requests remaining in the current window
`X-RateLimit-Reset`	integer	Unix timestamp when the window resets

HTTP/1.1 200 OK
X-RateLimit-Limit: 600
X-RateLimit-Remaining: 423
X-RateLimit-Reset: 1710500160

Read X-RateLimit-Remaining proactively. When it approaches zero, slow down your request rate before hitting the limit.

HTTP 429 Response

When you exceed the limit:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710500160
Retry-After: 47
Content-Type: application/json

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests.",
    "details": {
      "limit": 100,
      "window_seconds": 60,
      "retry_after_seconds": 47
    }
  }
}

Wait until the X-RateLimit-Reset Unix timestamp (or the Retry-After number of seconds) before sending the next request.

Per-Endpoint Pagination Limits

Some endpoints have a maximum page size that caps how many items you can fetch per call:

Endpoint	Default page size	Maximum page size
`GET /trade/orders`	100	500
`GET /trade/history`	50	500
`GET /analytics/leaderboard`	—	50 entries returned
`GET /backtest/{id}/results/trades`	1,000	10,000
`GET /battles/{id}/snapshots`	10,000	100,000
`GET /market/tickers` (batch)	—	100 symbols

Retry Strategy

For 429 (Rate Limited)

Always use the Retry-After header value — it gives you the exact number of seconds to wait:

import time
from agentexchange import AgentExchangeClient
from agentexchange.exceptions import RateLimitError

with AgentExchangeClient(api_key="ak_live_...") as client:
    while True:
        try:
            price = client.get_price("BTCUSDT")
            break
        except RateLimitError as e:
            wait = e.retry_after or 60
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)

import time
import requests

def get_with_rate_limit_retry(url, headers, max_retries=5):
    for attempt in range(max_retries):
        resp = requests.get(url, headers=headers)

        if resp.status_code == 429:
            retry_after = int(resp.headers.get("Retry-After", 60))
            print(f"Rate limited. Sleeping {retry_after}s (attempt {attempt + 1})")
            time.sleep(retry_after)
            continue

        resp.raise_for_status()
        return resp.json()

    raise RuntimeError(f"Still rate limited after {max_retries} retries")

For 500 / 503 (Server Errors)

Use exponential back-off with jitter:

import time
import random

def exponential_backoff(attempt, base=1.0, max_wait=60.0):
    """Calculate wait time with jitter."""
    wait = min(base * (2 ** attempt), max_wait)
    jitter = random.uniform(0, wait * 0.1)  # 10% jitter
    return wait + jitter

for attempt in range(4):
    resp = requests.post(url, ...)
    if resp.status_code in (500, 503):
        wait = exponential_backoff(attempt)
        print(f"Server error. Retry {attempt + 1} in {wait:.1f}s...")
        time.sleep(wait)
        continue
    break

Best Practices

Use batch endpoints

Instead of looping over individual price lookups, use batch endpoints to fetch multiple results in one request:

# Slow: 600 individual requests
for symbol in symbols:
    price = client.get_price(symbol)   # 600 requests

# Fast: 1 request
prices = client.get_all_prices()      # returns all 600+ prices

Cache prices locally

Live prices update every ~200ms but most strategies don't need millisecond precision. Cache the last price locally and only fetch on your strategy's candle interval:

import time

price_cache = {}
CACHE_TTL = 5  # seconds

def get_price_cached(client, symbol):
    now = time.time()
    if symbol in price_cache:
        cached_at, cached_price = price_cache[symbol]
        if now - cached_at < CACHE_TTL:
            return cached_price

    price = client.get_price(symbol)
    price_cache[symbol] = (now, price.price)
    return price.price

Batch candle requests

Fetch more candles per request rather than making repeated small requests:

# Inefficient: many small requests
for i in range(50):
    candles = client.get_candles("BTCUSDT", limit=10, ...)

# Efficient: one request
candles = client.get_candles("BTCUSDT", limit=500)

Monitor remaining budget proactively

Check X-RateLimit-Remaining before starting a bulk operation. The Python SDK exposes this on every response:

# After any SDK call, check the last response headers
remaining = client.last_response_headers.get("X-RateLimit-Remaining")
if remaining and int(remaining) < 50:
    print(f"Warning: only {remaining} requests remaining in this window")
    time.sleep(5)

Use WebSocket for real-time data

If you need continuous price updates, subscribe via WebSocket instead of polling GET /market/price/{symbol}:

from agentexchange import AgentExchangeWS

ws = AgentExchangeWS(api_key="ak_live_...")

@ws.on_ticker("BTCUSDT")
def handle_btc(msg):
    price = msg["data"]["price"]
    # Your strategy logic here — zero API requests

ws.run_forever()

A single WebSocket connection replaces hundreds of polling requests and is not subject to HTTP rate limiting. See WebSocket Channels for details.

Two Separate Rate Limit Systems

There are two independent rate limit systems. Passing one does not guarantee passing the other.

System 1: HTTP Middleware

Source: src/api/middleware/rate_limit.py
Scope: Per API key, per tier (market/orders/general)
Counts: All HTTP requests, including rejected ones
Configurable: No — limits are hardcoded constants

System 2: Risk Manager Order Rate Limit

Source: src/risk/manager.py (Step 3 of 8-step validation)
Scope: Per account or agent, orders only
Counts: Only orders that pass all 8 validation steps (rejected orders are "free")
Default: 100 orders/min
Configurable: Yes — via risk_profile.order_rate_limit on the account or agent

An order request can:

Pass the HTTP middleware (within the 100 req/min orders tier) but fail the Risk Manager limit if you've submitted 100+ orders that all succeeded in the same minute
Pass the Risk Manager limit but fail the HTTP middleware if you're also counting failed/rejected order attempts

In practice, for normal trading agents these limits behave the same. They only diverge for high-frequency strategies placing many orders per minute.

Redis Failure Behavior

The HTTP rate limiter uses Redis to track counters. If Redis is unavailable:

The counter returns 0
All requests are allowed through (fail-open policy)
A rate_limit.redis_error log event is emitted

This means a Redis outage effectively disables rate limiting. The platform monitors for this condition.

Errors — RATE_LIMIT_EXCEEDED error code details
WebSocket Connection — real-time data without polling
Authentication — API key setup