Context
The server does not enforce per-client or per-tenant rate limits. Rate limiting is required for API hardening and is called out in the runtime's INFRASTRUCTURE.md.
Requirements
- Middleware-based rate limiting with configurable thresholds
- Per-client (API token or session) and per-workspace tracking
- Configurable limits per endpoint category:
- Authentication endpoints: stricter limits to prevent brute-force
- File upload and job dispatch: throughput-based limits
- Read endpoints: higher allowances
- Return
429 Too Many Requests with Retry-After header
- Rate limit state storage (in-memory or Redis-backed for multi-instance deployments)
References
Context
The server does not enforce per-client or per-tenant rate limits. Rate limiting is required for API hardening and is called out in the runtime's INFRASTRUCTURE.md.
Requirements
429 Too Many RequestswithRetry-AfterheaderReferences