Skip to main content

Cache tuning

Cache is LRU by default; LFU (frequency-weighted) is an option. What to tune:
SettingEffectHow to choose
cache.size_gbHow much disk to dedicateLarger = higher hit rate up to diminishing returns; typical break-even is 200–1000 GB
cache.evictionlru or lfuLRU for bursty traffic; LFU for long-tail
cache.min_age_secondsSoft lower-bound on evictionPrevents thrashing on a cold cache
cache.probe_hold_durationProbe-triggered eviction holdDo not reduce below the default — shrinking it risks phantom-announcement slashes

Probe-triggered eviction hold

When you sign has_blob: true in a ProbeResponse, the blob is held in cache for probe_hold_duration. This prevents the evil pattern “probe responded yes, client opened channel, cache pressure evicted the blob, stream request fails, node slashed for phantom announcement.” Monitor decdn_probe_hold_slots_used / decdn_probe_hold_slots_max. If saturation exceeds ~80%, increase cache size — you have more in-flight promises than your cache can safely hold.

Region selection

Your region determines:
  • Which regional gossip topic you subscribe to (cdn/region/{cc}/v1).
  • Which regional takedown entries bind you (takedown compliance).
  • Which clients prefer you in the selection score (clients in the same region give lower RTT, making your score better).
Pick the region your node is physically in. Advertising a region you’re not in leaves you serving clients with worse RTT than advertised, which hurts reputation quickly.

Pull-through behavior

On cache miss, a node with pull_through: true:
  1. Issues cdn/dht/v1 FIND_VALUE(hash).
  2. Probes returned candidates in parallel via cdn/probe/v1.
  3. Selects the best candidate by the unified selection score.
  4. Pulls via cdn/client/v1paying the upstream node per MB.
  5. Streams bytes to the client while pulling, with BLAKE3 verification on every chunk.
  6. Caches locally for subsequent requests.
With pull_through: false, the node returns a redirect to another NodeId. The client opens a direct channel with that node. Pull-through generally earns more — you collect the per-MB revenue from the client and pay upstream a smaller per-MB cost. Redirect is useful when you explicitly want to limit your egress or when you are origin-backed and want pure one-hop service.

Rate setting

Your rate is advertised in NodeAnnounce and ProbeResponse. Update via RateChange gossip message.
  • Must be within governance-set [MIN_RATE, MAX_RATE] for the token.
  • Rate changes are immediately effective for new streams; existing streams continue at the agreed rate.
  • Rate changes carry a slash_sig so the change can serve as counter-evidence in a rate-manipulation dispute.
Start around your backend-egress-plus-margin if origin-backed, or 50–80% of the average advertised rate if pure-cache. Watch your hit rate and utilization, then tune.

Concurrency limits

Two limits interact:
  • max_concurrent_streams — total active cdn/client/v1 streams.
  • max_concurrent_channel_opens — active on-chain channel opens in the last block window.
Over-subscribing either degrades quality of service (timeouts) and hits reputation. Under-subscribing leaves money on the table. Start at sensible defaults; tune based on load.

Prefetching

Enable prefetch to proactively pull popular content:
  • Local demandprefetch.local_miss_threshold = 3 within 5m — start prefetching when you see the same miss N times.
  • Network popularityprefetch.popular_hash_peers = 3 within 10m — start prefetching when N peers list the hash in their popular_hashes.
Prefetching costs real money (per-MB pull from upstream). Profitable only if your expected client demand for that hash exceeds the pull cost.

Backups and crash recovery

Nothing in the cache needs backing up — cache is rebuildable by its nature. What does need protecting:
  • iroh keypair — losing this requires reclaimNodeId, which proves ownership of the bound Ethereum address.
  • Ethereum keypair — losing this loses access to stake withdrawal.
  • State directory — contains in-flight voucher state. On crash, the node resumes from the last flushed voucher state (64 MiB flush cadence for client-side state; node-side is similar).
Back up both keys off-host. Use hardware wallets or platform keychains where possible.

Graceful shutdown

  1. Stop accepting new streams.
  2. Drain active streams.
  3. Deregister watchtower monitoring briefly, or notify watchtowers of expected downtime.
  4. Flush voucher state.
  5. Publish a final NodeAnnounce with departing: true.
Avoid hard kills — in-flight vouchers are the main loss surface. Run under a supervisor with a 30–60 second stop timeout.