Building Mission-Critical APIs: Rate Limiting, Circuit Breakers, and the 99.99% Uptime Challenge

A client's API was crashing under load. 2,000 requests per second? Down. 5,000? Down. They needed to handle 10,000+ requests per second with 99.99% uptime.

"We're losing customers every time we go down," the CTO said. "This needs to be bulletproof."

Six months later, their API handles 15,000 requests per second with 99.99% uptime. Here's how we built it.

The Architecture

We built a multi-layered defense system:

Rate Limiting: Prevent abuse and overload
Circuit Breakers: Fail fast when dependencies are down
Graceful Degradation: Serve partial responses when possible
Load Balancing: Distribute traffic across instances
Monitoring: Real-time alerts and dashboards

Layer 1: Rate Limiting

We implemented three types of rate limiting:

1. Per-User Rate Limiting

Using Redis with sliding window algorithm:

import { Redis } from 'ioredis';

async function checkRateLimit(userId: string, limit: number, window: number) {
  const key = "rate_limit:" + userId;
  const current = await redis.incr(key);
  
  if (current === 1) {
    await redis.expire(key, window);
  }
  
  if (current > limit) {
    return { allowed: false, remaining: 0 };
  }
  
  return { allowed: true, remaining: limit - current };
}

// Usage
const { allowed, remaining } = await checkRateLimit(userId, 100, 60); // 100 req/min
if (!allowed) {
  return res.status(429).json({ error: 'Rate limit exceeded' });
}

2. Global Rate Limiting

Protect against DDoS and traffic spikes:

// At load balancer level (Nginx)
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;

server {
    location /api/ {
        limit_req zone=api_limit burst=200 nodelay;
        proxy_pass http://backend;
    }
}

3. Tiered Rate Limits

Different limits for different user tiers:

Free tier: 100 requests/minute
Pro tier: 1,000 requests/minute
Enterprise: 10,000 requests/minute

Layer 2: Circuit Breakers

When external dependencies fail, fail fast—don't cascade:

class CircuitBreaker {
  private failures = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private nextAttempt = Date.now();
  
  constructor(
    private threshold: number = 5,
    private timeout: number = 60000
  ) {}
  
  async execute(fn: () => Promise): Promise {
    if (this.state === 'open') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is OPEN');
      }
      this.state = 'half-open';
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  private onSuccess() {
    this.failures = 0;
    this.state = 'closed';
  }
  
  private onFailure() {
    this.failures++;
    if (this.failures >= this.threshold) {
      this.state = 'open';
      this.nextAttempt = Date.now() + this.timeout;
    }
  }
}

// Usage
const dbBreaker = new CircuitBreaker(5, 60000);

try {
  const data = await dbBreaker.execute(() => database.query(sql));
} catch (error) {
  // Return cached data or default response
  return getCachedData();
}

Layer 3: Graceful Degradation

When services are down, serve what you can:

async function getProductData(productId: string) {
  try {
    // Try primary data source
    const product = await db.query('SELECT * FROM products WHERE id = ?', [productId]);
    const reviews = await reviewService.getReviews(productId);
    const recommendations = await aiService.getRecommendations(productId);
    
    return {
      product,
      reviews,
      recommendations,
      source: 'full'
    };
  } catch (error) {
    // Fallback: serve from cache
    const cached = await cache.get("product:" + productId);
    if (cached) {
      return {
        ...cached,
        source: 'cache',
        note: 'Some features temporarily unavailable'
      };
    }
    
    // Last resort: minimal response
    return {
      product: { id: productId, name: 'Product unavailable' },
      source: 'minimal'
    };
  }
}

Layer 4: Load Balancing & Auto-Scaling

We use Kubernetes with horizontal pod autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

When CPU hits 70%, Kubernetes automatically scales up. When traffic drops, it scales down.

Layer 5: Monitoring & Alerting

We track:

Request rate: Requests per second
Error rate: 4xx and 5xx responses
Latency: P50, P95, P99 response times
Circuit breaker state: Open/closed status
Rate limit hits: How many requests are throttled
Dependency health: Database, cache, external APIs

Alert thresholds:

Error rate > 1%: Warning alert
Error rate > 5%: Critical alert
P99 latency > 1s: Warning alert
P99 latency > 3s: Critical alert
Circuit breaker opens: Immediate alert

Real-World Example: Handling a Traffic Spike

A client got featured on Product Hunt. Traffic spiked from 500 req/s to 8,000 req/s in 10 minutes.

Here's what happened:

Rate limiting: Throttled abusive requests (saved 30% capacity)
Auto-scaling: Kubernetes scaled from 3 to 25 pods (handled the load)
Circuit breakers: Protected against database overload (prevented cascade failure)
Graceful degradation: Served cached data when database was slow (maintained UX)

Result: API stayed up, 99.99% uptime maintained, zero data loss.

Common Mistakes to Avoid

❌ Don't Do This:

Rate limit without proper error messages (users get confused)
Set circuit breaker threshold too low (opens on normal failures)
Forget to implement graceful degradation (everything breaks)
Monitor only error rates (miss latency issues)
Scale manually (too slow for traffic spikes)

The Results

Peak throughput: 15,000 requests/second
Uptime: 99.99% (4 minutes downtime per month)
P99 latency: 250ms (under 1s target)
Error rate: 0.05% (under 0.1% target)
Rate limit effectiveness: Blocked 15% abusive traffic

Implementation Checklist

For your mission-critical API:

Implement per-user rate limiting (Redis)
Add global rate limiting (load balancer)
Set up circuit breakers for all external dependencies
Implement graceful degradation (cached fallbacks)
Configure auto-scaling (Kubernetes HPA or similar)
Set up monitoring (Prometheus + Grafana)
Configure alerting (PagerDuty or similar)
Load test your API (find breaking points)
Document runbooks (what to do when alerts fire)
Run chaos engineering tests (simulate failures)

The Bottom Line

99.99% uptime isn't achieved by hoping nothing breaks. It's achieved by building layers of defense: rate limiting, circuit breakers, graceful degradation, and monitoring.

At NetForceLabs, we don't build APIs that work when everything is perfect. We build APIs that work when everything is broken.