WordPress REST API Rate Limiting: Implementation Patterns

Introduction
Understanding Rate Limiting Algorithms
Token Bucket Implementation
Sliding Window Algorithm
Transient-Based Rate Limiting
Per-User vs Per-IP Rate Limiting
Response Headers and Client Communication
Frequently Asked Questions
Conclusion

Introduction

Protecting your WordPress REST API from abuse is critical for maintaining site stability and ensuring fair access for all users. WordPress REST API rate limiting is the practice of restricting the number of requests a client can make in a given time period. Without it, a single bad actor—or even a poorly configured application—could consume all your server resources and degrade performance for legitimate users.

Rate limiting serves multiple purposes. It prevents denial-of-service attacks, protects against brute force attempts, ensures fair resource distribution, and helps maintain predictable performance. Whether you're running a public API or a private plugin integration, implementing rate limiting is essential.

In this guide, we'll explore the most common rate limiting algorithms, examine how to implement them using WordPress's transient API, discuss the tradeoffs between per-user and per-IP rate limiting, and explain how to communicate rate limits to clients through HTTP response headers. WP HealthKit analyzes rate limiting implementations across your plugins to ensure they're adequate for your API's needs.

Understanding Rate Limiting Algorithms

Rate limiting algorithms determine how strictly requests are throttled and how the limit "resets." Different algorithms have different characteristics, making them suitable for different scenarios. The two most common approaches are token bucket and sliding window algorithms.

Token bucket algorithms work by maintaining a bucket that holds a certain number of tokens. Each request consumes one token, and tokens are refilled at a constant rate over time. If a client uses all their tokens, subsequent requests are rejected until more tokens are available. This algorithm allows for burst traffic while maintaining a long-term rate limit.

The sliding window algorithm tracks requests within a time window and rejects requests that would exceed the limit within that window. For example, a 10-requests-per-hour limit would track the timestamp of each request and reject any new request if the previous 10 requests occurred within the last hour. This algorithm is more precise but slightly more complex to implement efficiently.

Each algorithm has tradeoffs. Token bucket is more lenient with burst traffic and requires less memory since you only track bucket state, not individual requests. Sliding window is more accurate and harder to game, but requires tracking request history.

Understanding the appropriate use case for each algorithm is essential. Token bucket shines in scenarios where you expect users to make occasional bursts of requests followed by periods of inactivity. Imagine a mobile application that periodically syncs data with your API—the sync might send five requests in quick succession, then nothing for ten minutes. Token bucket allows this pattern while still enforcing a long-term rate limit. Conversely, sliding window excels when you need strict enforcement against abusive patterns. If you're concerned about attackers trying to slowly enumerate your API endpoints or perform credential-guessing attacks, sliding window's precision ensures they can't circumvent limits by spacing requests.

The choice between these algorithms depends on your threat model and user expectations. Public APIs typically prioritize user experience and choose token bucket. Security-sensitive endpoints like authentication or administrative APIs choose sliding window for stricter enforcement. Some sophisticated implementations use hybrid approaches, applying token bucket to normal users and sliding window to accounts that show suspicious behavior.

Token Bucket Implementation

The token bucket algorithm is popular because it's simple to implement and aligns with how many developers intuitively think about rate limiting. Let's build a practical implementation using WordPress transients:

class RateLimiter {
  private $bucket_size = 10;
  private $refill_rate = 2; // tokens per minute
  private $refill_interval = MINUTE_IN_SECONDS;

  public function is_rate_limited( $identifier ) {
    $transient_key = 'token_bucket_' . $identifier;
    $bucket = get_transient( $transient_key );

    if ( false === $bucket ) {
      // Initialize bucket with full capacity
      $bucket = array(
        'tokens'      => $this->bucket_size,
        'last_refill' => current_time( 'timestamp' ),
      );
    } else {
      // Calculate elapsed time and add tokens
      $now = current_time( 'timestamp' );
      $elapsed = $now - $bucket['last_refill'];
      $new_tokens = floor( $elapsed / $this->refill_interval ) * $this->refill_rate;

      $bucket['tokens'] = min(
        $this->bucket_size,
        $bucket['tokens'] + $new_tokens
      );
      $bucket['last_refill'] = $now;
    }

    // Check if request is allowed
    if ( $bucket['tokens'] >= 1 ) {
      $bucket['tokens']--;
      set_transient( $transient_key, $bucket, $this->refill_interval * 2 );
      return false; // Not rate limited
    }

    set_transient( $transient_key, $bucket, $this->refill_interval * 2 );
    return true; // Rate limited
  }
}

This implementation allows clients to make up to 10 requests immediately, then can make 2 more requests per minute. If a client makes 10 requests in one second, they'll be rate limited until tokens refill. But if they spread their requests out, they can make more requests over time.

The token bucket approach elegantly handles bursty traffic. A client with an important batch operation can use their tokens quickly, while a client making steady requests doesn't need to worry about rate limits. This makes the algorithm user-friendly while still protecting your infrastructure.

The implementation shown stores token bucket state in WordPress transients, leveraging the transient API's automatic expiration. When a client first accesses the endpoint, the code initializes their bucket with full capacity and the current timestamp. On subsequent requests, it calculates how much time has elapsed and adds new tokens proportionally. The algorithm then checks if tokens are available and either grants the request or denies it with rate limit status.

One important consideration is the transient TTL. The code sets it to twice the refill interval, ensuring the bucket data persists long enough for the algorithm to function correctly. If you reduce this value too much, the transient might expire before the rate limit window ends, causing legitimate users to lose their state and see inconsistent rate limiting behavior. Conversely, excessively long TTLs waste memory storing buckets for inactive clients. The balance depends on your application's traffic patterns and memory constraints.

For production WordPress installations, external caching like Redis or Memcached becomes essential. The database-backed transients used in development create excessive database queries when checking rate limits on every request. Redis stores and retrieves rate limit data in milliseconds with minimal memory overhead, scaling to millions of concurrent rate-limited clients without performance degradation.

Sliding Window Algorithm

The sliding window algorithm provides more precise rate limiting by tracking request timestamps:

class SlidingWindowRateLimiter {
  private $limit = 100;
  private $window = HOUR_IN_SECONDS;

  public function is_rate_limited( $identifier ) {
    $transient_key = 'sliding_window_' . $identifier;
    $requests = get_transient( $transient_key );

    if ( false === $requests ) {
      $requests = array();
    }

    $now = current_time( 'timestamp' );
    $window_start = $now - $this->window;

    // Remove old requests outside the window
    $requests = array_filter(
      $requests,
      function( $timestamp ) use ( $window_start ) {
        return $timestamp > $window_start;
      }
    );

    // Check if adding another request would exceed the limit
    if ( count( $requests ) >= $this->limit ) {
      return true; // Rate limited
    }

    // Add current request
    $requests[] = $now;
    set_transient( $transient_key, $requests, $this->window + 60 );

    return false; // Not rate limited
  }
}

Sliding window provides precise rate limiting and is harder to game. However, it requires storing a timestamp for every request, which uses more memory. For high-traffic endpoints, this could become a concern.

A hybrid approach called "sliding window with counter" provides a good balance: use a counter with fixed time windows, and only track detailed timestamps when the counter approaches the limit. This gives you precision where it matters while maintaining memory efficiency.

The sliding window approach in the code example maintains an array of request timestamps in transients. When a new request arrives, the code first removes any timestamps outside the current window, then checks if adding another request would exceed the limit. This precise tracking makes it extremely difficult for attackers to circumvent the limit by carefully spacing requests. Unlike token bucket, which might allow an attacker to use all tokens in a burst then slowly accumulate more, sliding window enforces the absolute maximum of the limit within the time window.

The tradeoff is computational overhead. Filtering the request array on every request means iterating through all timestamps in the window. For a limit of 100 requests per hour with high traffic, this could mean processing arrays with dozens or hundreds of entries. The memory overhead grows with traffic volume, making it less suitable for endpoints with thousands of requests per minute. However, for most WordPress API endpoints, this overhead is negligible, and the precision provided by sliding window is worth the cost.

A more sophisticated variation, known as sliding window with logarithmic resolution, samples request density at logarithmic time intervals rather than storing every single timestamp. This reduces memory usage while maintaining accuracy, making it suitable for high-volume endpoints.

Implement Rate Limiting Across Your REST APIs

Rate limiting protects your infrastructure while ensuring fair access. WP HealthKit analyzes your rate limiting strategy and identifies endpoints that are unprotected or under-protected.

Audit Your REST APIs

Transient-Based Rate Limiting

WordPress transients are an ideal mechanism for storing rate limit state. They provide simple key-value storage with automatic expiration, which is exactly what rate limiting needs. The transient API handles persistence (using either the database or external object caches) transparently.

When implementing rate limiting, choose transient expiration carefully. If you set the expiration too short, you might clean up state before the actual rate limit window expires. If you set it too long, you waste memory storing stale rate limit data. Generally, set the transient TTL to at least the rate limit window duration plus a buffer.

public function apply_rate_limit( $request, $identifier ) {
  $limiter = new RateLimiter();

  if ( $limiter->is_rate_limited( $identifier ) ) {
    return new WP_Error(
      'rest_rate_limited',
      'Too many requests. Please try again later.',
      array( 'status' => 429 )
    );
  }

  return true;
}

For high-performance systems, consider using persistent object caches like Redis or Memcached instead of the WordPress transient API. These external caches are much faster than database-backed transients and provide better performance for rate limiting, which is checked on every request.

WP HealthKit checks whether you're using appropriate storage for rate limiting and identifies performance concerns. If you're storing massive amounts of rate limit data in the database, the tool will recommend moving to external caching.

Per-User vs Per-IP Rate Limiting

The choice between per-user and per-IP rate limiting depends on your API's use case and threat model. Per-user rate limiting is more user-friendly but less effective against certain attacks. Per-IP rate limiting is stricter but can unfairly limit legitimate users behind shared IPs.

Per-user rate limiting works best when you have authenticated users and want to prevent individual users from consuming too many resources:

public function get_api_identifier() {
  if ( is_user_logged_in() ) {
    return 'user_' . get_current_user_id();
  }

  return 'ip_' . sanitize_text_field( $_SERVER['REMOTE_ADDR'] );
}

Per-IP rate limiting is stricter and works for public APIs where authentication isn't required:

public function get_api_identifier() {
  $ip = sanitize_text_field( $_SERVER['REMOTE_ADDR'] );

  // Check for X-Forwarded-For header when behind proxy
  if ( ! empty( $_SERVER['HTTP_X_FORWARDED_FOR'] ) ) {
    $ips = explode( ',', sanitize_text_field( $_SERVER['HTTP_X_FORWARDED_FOR'] ) );
    $ip = array_pop( $ips );
  }

  return 'ip_' . trim( $ip );
}

Many applications use a hybrid approach: authenticated requests are rate limited per-user, while unauthenticated requests are rate limited per-IP. This provides security without penalizing legitimate users who authenticate.

For API keys, implement per-API-key rate limiting instead:

if ( $api_key_valid ) {
  return 'key_' . $api_key_id;
}

This allows different applications to have independent rate limit budgets, preventing one misbehaving application from affecting others.

Response Headers and Client Communication

Standard HTTP response headers allow clients to understand rate limit status and adjust their behavior accordingly. The IETF draft specification for Rate Limit Headers defines a standard format that major API providers are adopting.

Implement these headers in your responses:

public function add_rate_limit_headers( $response, $limiter, $identifier ) {
  $limit = $limiter->get_limit( $identifier );
  $remaining = $limiter->get_remaining( $identifier );
  $reset_time = $limiter->get_reset_time( $identifier );

  $response->header( 'RateLimit-Limit', $limit );
  $response->header( 'RateLimit-Remaining', $remaining );
  $response->header( 'RateLimit-Reset', $reset_time );

  // Legacy headers for compatibility
  $response->header( 'X-RateLimit-Limit', $limit );
  $response->header( 'X-RateLimit-Remaining', $remaining );
  $response->header( 'X-RateLimit-Reset', $reset_time );

  return $response;
}

These headers tell clients how many requests they have left and when their limit resets. A well-designed client application will check these headers and throttle requests before hitting the limit, improving overall system stability.

When a client exceeds the rate limit, return a 429 (Too Many Requests) status code with a Retry-After header indicating when they can retry:

if ( $is_rate_limited ) {
  $response = new WP_Error(
    'rest_rate_limited',
    'Rate limit exceeded',
    array( 'status' => 429 )
  );

  $reset_time = $limiter->get_reset_time( $identifier );
  $retry_after = max( 1, $reset_time - current_time( 'timestamp' ) );

  $response->add_data( array(
    'Retry-After' => $retry_after,
  ) );

  return $response;
}

Additional Resources

For a comprehensive view of how WP HealthKit approaches plugin analysis, explore our 17 verification layers or browse the plugin directory to see real audit scores. Ready to check your own plugin? Run a free audit now.

Frequently Asked Questions

Should I rate limit authenticated requests differently than unauthenticated ones?

Yes. Authenticated requests typically have higher limits because you can identify the user and trust them to some degree. Unauthenticated requests should have stricter limits to prevent abuse by anonymous actors. A common pattern is 1,000 requests per day for unauthenticated and 10,000 per day for authenticated, though your limits should match your infrastructure's capacity and your API's purpose.

What's the best rate limiting strategy for public APIs?

For public APIs, use a combination of per-IP rate limiting for unauthenticated requests and per-user or per-API-key rate limiting for authenticated requests. Implement generous limits so legitimate applications aren't affected, but strict enough to prevent abuse. Monitor actual usage patterns and adjust limits based on observed behavior. WP HealthKit can help you understand your API usage patterns.

How do I prevent rate limiting from affecting legitimate users behind proxies?

When extracting the client IP for rate limiting, check for the X-Forwarded-For header which proxies use to pass the original IP. However, be careful—if you're not actually behind a proxy, trusting X-Forwarded-For can be a security issue. Only trust this header if your reverse proxy is configured to set it. Consider using a library that handles this correctly or carefully reviewing your proxy setup.

Can I offer different rate limits for different API keys?

Absolutely. This is called tiered rate limiting and allows you to offer different service levels. You might give free API keys a limit of 100 requests per day, basic paid keys 10,000 per day, and premium keys unlimited access. Implement this by checking the API key tier before determining the rate limit.

Should I use token bucket or sliding window?

Token bucket is simpler to implement and more user-friendly for bursty traffic. Sliding window is more precise and harder to game. For most WordPress applications, token bucket is the better choice due to its simplicity and lower memory overhead. Use sliding window if you have strict security requirements and the extra complexity is acceptable.

How do I handle rate limiting during traffic spikes?

Implement graceful degradation: when your infrastructure is under load, reduce rate limits slightly to prevent overload. You can also implement request queuing where requests exceeding current capacity are queued and processed when resources become available. WP HealthKit monitors your rate limiting and can alert you to potential overload situations.

Conclusion

WordPress REST API rate limiting is an essential security and stability measure for any WordPress site exposing an API. By choosing an appropriate algorithm, implementing it efficiently with transients or external caches, communicating limits clearly through HTTP headers, and adjusting limits based on your infrastructure and user base, you can provide a reliable API that serves legitimate users while protecting against abuse.

The key is striking the right balance—limits should be strict enough to prevent abuse but generous enough that legitimate applications don't encounter them during normal operation. Different endpoints may need different limits based on their resource consumption.

WP HealthKit analyzes your rate limiting implementation across all your custom endpoints, ensuring they're appropriate for your use case and correctly implemented. Start auditing your API's rate limiting today to ensure it's protecting your infrastructure effectively.

Analyze Your Rate Limiting