Skip to content

Health scoring

Each provider gets a score in [0.0, 1.0] recomputed every probe cycle. The router selects providers based on these scores.


Score formula

score = w_latency * latency_score
      + w_error   * (1 − error_rate)
      + w_slot    * slot_freshness_score
      + w_success * recent_success_rate

Weights are normalised before use — they don't need to sum to 1.

Component Default weight What it measures
latency_score 0.4 Round-trip time. ~20ms → near 1.0; ~500ms → near 0.0.
1 − error_rate 0.3 Fraction of probes that succeeded in the last window_secs (default 60s).
slot_freshness_score 0.2 1.0 when at the network tip; 0.0 when ≥ slot_drift_threshold slots behind (default 10).
recent_success_rate 0.1 Fraction of the last N probes that succeeded. Fast-reaction signal.

Background probes

Health probe (default: every 2s)

Sends getSlot + getHealth, measures round-trip latency, tracks success/failure, updates the rolling error rate and consecutive-failure counter. Triggers circuit breaker state transitions.

Slot tracker (default: every 1s)

Polls getSlot commitment=processed, records the result, computes the network tip (max slot across all providers), and calculates per-provider slot drift.


Circuit breaker

Closed ──(N failures or error_rate > threshold)──▶ Open
   ▲                                                │
   └───── probe succeeds ──── HalfOpen ◄────────────┘
                                  (after cooldown_secs)

Closed — normal routing. All requests are eligible.

Open — provider excluded from routing. Opens when either:

  • Consecutive probe failures ≥ circuit_open_failures (default 5), or
  • Rolling error rate ≥ circuit_error_threshold (default 0.5)

HalfOpen — after circuit_cooldown_secs (default 30s), one probe is sent. Success closes the circuit; failure reopens it and resets the cooldown.

Note

When all providers have open circuits, the proxy routes to all of them anyway — degraded behaviour is better than refusing to attempt.


Tuning for latency-sensitive workloads

For trading bots and latency-critical paths, tighten the probe interval and weight slot freshness heavily:

[health]
interval_ms            = 1000   # probe every second
window_secs            = 30     # react faster to errors
slot_drift_threshold   = 3      # aggressively deprioritise stale nodes
slot_interval_ms       = 500    # track slots twice per second
circuit_open_failures  = 3      # fail fast
circuit_error_threshold = 0.3   # open at 30% error rate
circuit_cooldown_secs  = 10     # recover quickly

w_latency = 0.3
w_error   = 0.2
w_slot    = 0.4   # slot freshness is critical for transaction landing
w_success = 0.1

See the trading-bot example in the repo.


Checking scores live

rpc-plane status
#   NAME        SCORE          SLOT   DRIFT     LATENCY  CIRCUIT
#   --------  -------  ------------  ------  ----------  -------
#   helius      0.912   341892471       0      23.4ms     closed
#   quicknode   0.841   341892469       2      31.1ms     closed
#   triton      0.000           —       —           —     open

Or via the health endpoint:

curl http://localhost:9401/health | jq