Rate Limiting Configuration

Protect your backend services from abuse with per-IP rate limiting.

Options

config:
  rateLimit:
    enabled: true
    requestsPerSec: 100
    burst: 200

Field	Type	Default	Description
`enabled`	bool	`false`	Enable rate limiting
`requestsPerSec`	int	`100`	Sustained request rate per IP
`burst`	int	`requestsPerSec * 2`	Maximum burst above the sustained rate

The rate limiter uses a token bucket algorithm per client IP:

requestsPerSec is the sustained rate — how many requests per second an IP can make continuously.

burst is the spike allowance — how many requests can come in at once before being throttled.

Example: requestsPerSec: 10, burst: 50

The rate limiter identifies clients by IP address:

This works correctly behind load balancers and reverse proxies that set X-Forwarded-For.

When rate limited, the gateway returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 1

{"error": "rate limit exceeded", "success": false}

The Retry-After header tells the client when to try again.

With telemetry enabled, rate limit hits are tracked as a Prometheus metric:

rate(http_server_rate_limit_hits_total[5m])

Start with generous limits and tighten based on real traffic
Set burst to at least 2x requestsPerSec to absorb natural traffic spikes
The rate limiter state is per gateway instance — not shared across instances
Health check (/health) and metrics (/metrics) endpoints are not rate limited