<!-- Generated from TradeReady.io docs. Visit https://tradeready.io/docs for the full experience. -->

---
title: Reward Functions
description: 5 built-in reward functions and how to write custom ones
---

The reward function shapes what your agent learns to optimize. Choosing the right one is one of the most important training decisions.

---

## Available Reward Functions

```python
from tradeready_gym.rewards import (
    PnLReward,
    LogReturnReward,
    SharpeReward,
    SortinoReward,
    DrawdownPenaltyReward,
    CustomReward,
)
```

Pass a reward function instance to any environment via the `reward_function` parameter:

```python
env = gym.make(
    "TradeReady-BTC-v0",
    api_key="ak_live_...",
    reward_function=SharpeReward(window=50),
)
```

---

## PnLReward

**Formula:** `current_equity - previous_equity`

The simplest reward: the absolute dollar change in equity per step.

```python
env = gym.make("TradeReady-BTC-v0",
    reward_function=PnLReward()  # default
)
```

**Best for:** Baselines, sanity checks, and simple environments. Easy to debug because the reward is directly interpretable.

**Drawback:** An agent can achieve high PnL with very high variance — it learns to chase returns without penalizing risk. This often produces strategies with large drawdowns.

---

## LogReturnReward

**Formula:** `log(current_equity / previous_equity)`

Log returns produce more stable gradients than raw PnL because they are symmetric and additive across time steps.

```python
env = gym.make("TradeReady-BTC-v0",
    reward_function=LogReturnReward()
)
```

**Best for:** Initial training runs where gradient stability is a priority.

**Note:** Log returns require `current_equity > 0`. The environment handles the edge case of equity reaching zero (episode truncated).

---

## SharpeReward

**Formula:** Rolling Sharpe ratio delta over the last N steps.

The agent is rewarded for improvements to its risk-adjusted return, not just raw return. An agent that earns 0.5% with low volatility gets a higher reward than one that earns 0.5% with high volatility.

```python
env = gym.make("TradeReady-BTC-v0",
    reward_function=SharpeReward(window=50)
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `window` | `50` | Number of steps in the rolling Sharpe calculation |

**Best for:** Training agents that should manage risk — strategies meant for live deployment where drawdown control matters.

**Drawback:** Can converge slowly because the reward signal is noisy at the start of training when the rolling window is not yet full.

---

## SortinoReward

**Formula:** Rolling Sortino ratio delta — like Sharpe but only penalizes downside volatility.

The Sortino ratio ignores upside variance, which means an agent is not penalized for large positive returns. Only downside swings reduce the reward.

```python
env = gym.make("TradeReady-BTC-v0",
    reward_function=SortinoReward(window=50)
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `window` | `50` | Rolling window size |

**Best for:** Strategies where the goal is specifically to minimize losses while allowing for asymmetric upside.

---

## DrawdownPenaltyReward

**Formula:** `PnL - penalty_coeff * current_drawdown * equity`

Combines raw PnL with a penalty proportional to the current drawdown from the equity peak. The agent is penalized more as the drawdown deepens.

```python
env = gym.make("TradeReady-BTC-v0",
    reward_function=DrawdownPenaltyReward(penalty_coeff=1.0)
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `penalty_coeff` | `1.0` | How much to penalize drawdowns. Higher = more conservative. |

**Best for:** Capital preservation strategies where surviving is more important than maximizing returns.

**Tuning:** A `penalty_coeff` of `0.5` mildly discourages drawdowns. A value of `2.0` aggressively punishes any drawdown and tends to produce very conservative agents.

---

## Reward Comparison

| Reward | Signal type | Convergence speed | Risk awareness |
|--------|-------------|-------------------|----------------|
| `PnLReward` | Return | Fast | None |
| `LogReturnReward` | Return | Fast | None |
| `SharpeReward` | Risk-adjusted | Slow | High |
| `SortinoReward` | Downside risk | Slow | High (asymmetric) |
| `DrawdownPenaltyReward` | Return - penalty | Medium | Medium |

---

## Custom Rewards

Subclass `CustomReward` to define any reward you want. You must implement:

- `compute(prev_equity, curr_equity, info) -> float` — called on every step
- `reset()` — called at the start of each episode to clear accumulated state

```python
from tradeready_gym.rewards import CustomReward

class RiskAdjustedReward(CustomReward):
    def __init__(self, risk_penalty: float = 0.5):
        self.risk_penalty = risk_penalty
        self._peak = 0.0

    def compute(self, prev_equity: float, curr_equity: float, info: dict) -> float:
        pnl = curr_equity - prev_equity
        # Track running peak for drawdown calculation
        self._peak = max(self._peak, curr_equity)
        drawdown = (self._peak - curr_equity) / self._peak if self._peak > 0 else 0.0
        # Reward = PnL minus a drawdown penalty
        return pnl - self.risk_penalty * drawdown * curr_equity

    def reset(self) -> None:
        self._peak = 0.0

env = gym.make(
    "TradeReady-BTC-v0",
    api_key="ak_live_...",
    reward_function=RiskAdjustedReward(risk_penalty=0.5),
)
```

The `info` dict passed to `compute` contains the full step info including `filled_orders`, `unrealized_pnl`, `position_value`, and `virtual_time`. You can use any of these to craft reward signals.

> **Info:**
> The `reset()` method is called automatically by the environment at the start of each new episode. Forgetting to implement it causes state to bleed between episodes, which will corrupt training.

---

## Reward Shaping Tips

- **Start with `PnLReward`** to verify the environment and training loop are working before switching to more complex rewards.
- **Use `SharpeReward` for production-quality agents** — risk-adjusted returns produce more stable live performance.
- **Tune `DrawdownPenaltyReward` coefficient** by watching the equity curve in training. If the agent barely trades, lower the coefficient.
- **Combine ideas in a custom reward** — e.g. log return (stable gradient) plus a drawdown penalty (risk control).

---

## Next Steps

- [Training Tracking](/docs/gym/training-tracking) — visualize learning curves in the dashboard
- [Examples](/docs/gym/examples) — complete PPO and custom reward training scripts
