All Posts
RedisFlaskPythonBackendPerformance

Redis Caching Strategies That Actually Work in Production Flask APIs

5 April 2025·8 min read·Harshit Gupta
TL;DR

Redis caching can reduce database load by 90% and API response times from 300ms to under 10ms. But bad caching strategy causes stale data bugs, cache stampedes, and memory bloat. This post covers 5 production patterns: cache-aside, write-through, TTL design, cache stampede prevention, and cache invalidation by event.

Why Most Teams Cache Wrong

At CertifyMe, before we had proper caching, our credential verification endpoint hit MySQL on every single request. Verification is a public-facing endpoint — anyone scanning a QR code on a certificate triggers it. During large award ceremonies, we'd get 500+ concurrent scans in minutes. MySQL buckled. Response times climbed to 8 seconds. Users assumed their certificates were fake.

We added Redis. Response times dropped to 4ms. But naively slapping a cache on everything introduced new bugs — expired credentials still showing as valid, user profile changes not reflecting for hours, and a cache stampede during a high-traffic event that was worse than having no cache at all. We learned every lesson the hard way so you don't have to.

Pattern 1: Cache-Aside (Lazy Loading)

The most common pattern. The application manages the cache explicitly — check the cache first, fall back to the database on a miss, then populate the cache for next time:

import redis
import json
from functools import wraps

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def cache_aside(key_fn, ttl=300):
    def decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            key = key_fn(*args, **kwargs)
            cached = r.get(key)
            if cached:
                return json.loads(cached)
            result = f(*args, **kwargs)
            r.setex(key, ttl, json.dumps(result))
            return result
        return wrapper
    return decorator

@cache_aside(key_fn=lambda credential_id: f"credential:{credential_id}", ttl=600)
def get_credential(credential_id):
    return db.query("SELECT * FROM credentials WHERE id = %s", [credential_id])
Key naming convention

Use colon-separated hierarchical keys: entity:id:field. For example: credential:abc123, user:42:profile, org:7:settings. This makes pattern-based invalidation (SCAN + DEL) and Redis keyspace analysis much easier in production.

Pattern 2: Write-Through Caching

Write-through keeps the cache synchronized on every write — you update both the database and the cache in the same operation. This guarantees the cache is never stale immediately after a write:

def update_credential_status(credential_id, new_status):
    # Update database
    db.execute(
        "UPDATE credentials SET status = %s WHERE id = %s",
        [new_status, credential_id]
    )
    # Immediately update cache — don't wait for TTL expiry
    key = f"credential:{credential_id}"
    cached = r.get(key)
    if cached:
        data = json.loads(cached)
        data['status'] = new_status
        r.setex(key, 600, json.dumps(data))

This pattern is critical for any field where stale data has real consequences. For us: credential status (valid/revoked). A revoked credential showing as valid — even for 5 minutes — was a compliance risk we couldn't accept.

Pattern 3: TTL Design Is a Product Decision

The most overlooked aspect of caching. TTL isn't a technical parameter — it's a product decision about acceptable staleness. Ask yourself: "If this data is stale for X seconds, what's the worst that happens?"

  • Credential verification data — 10 minutes. Revocations propagate via write-through anyway.
  • User profile data — 5 minutes. A stale display name is tolerable.
  • Org settings / feature flags — 60 seconds. Changes should take effect quickly.
  • Public stats / dashboard counts — 30 minutes. Nobody expects real-time accuracy.
  • Auth tokens — Never cache. Always verify against the source of truth.
TTL anti-pattern

Setting the same TTL (e.g., 5 minutes) for everything is a code smell. It means you haven't thought through the staleness tolerance for each data type. Over-caching mutable data causes subtle, hard-to-reproduce bugs that only appear in production under load.

Pattern 4: Preventing Cache Stampedes

A cache stampede happens when a hot key expires and hundreds of simultaneous requests all get cache misses at the same moment — all hitting the database simultaneously. We hit this during a university award ceremony with 800 concurrent attendees scanning QR codes.

The fix is probabilistic early expiration (or a simpler mutex lock on cache population):

import time

def get_with_mutex(key, db_fetch_fn, ttl=300):
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    # Use SET NX (set if not exists) as a distributed lock
    lock_key = f"lock:{key}"
    got_lock = r.set(lock_key, "1", nx=True, ex=10)  # 10s lock timeout

    if got_lock:
        try:
            result = db_fetch_fn()
            r.setex(key, ttl, json.dumps(result))
            return result
        finally:
            r.delete(lock_key)
    else:
        # Another process is populating — wait briefly and retry
        time.sleep(0.05)
        cached = r.get(key)
        return json.loads(cached) if cached else db_fetch_fn()

Pattern 5: Event-Based Cache Invalidation

Cache invalidation is famously one of the hard problems in CS. The cleanest solution we found: invalidate via events published to Redis pub/sub. When any service changes data, it publishes an invalidation event. All services that cache that data listen and invalidate immediately:

# Publisher (in the service that modifies data)
def revoke_credential(credential_id):
    db.execute("UPDATE credentials SET status='revoked' WHERE id=%s", [credential_id])
    r.publish('cache:invalidate', json.dumps({
        'keys': [f"credential:{credential_id}"],
        'reason': 'revoked'
    }))

# Subscriber (listener thread in any service that caches credentials)
def invalidation_listener():
    pubsub = r.pubsub()
    pubsub.subscribe('cache:invalidate')
    for message in pubsub.listen():
        if message['type'] == 'message':
            data = json.loads(message['data'])
            for key in data['keys']:
                r.delete(key)

Key Takeaways

  • Cache-aside for reads; write-through for mutable data where staleness has consequences
  • TTL is a product decision — model acceptable staleness per data type, not a global setting
  • Use SET NX mutex locks to prevent cache stampedes on hot keys
  • Event-based invalidation via pub/sub is cleaner than TTL-only for consistency-critical data
  • Hierarchical key naming (entity:id:field) makes pattern-based operations manageable
  • Never cache auth tokens — always validate against the source of truth
Back to All Posts

Written by Harshit Gupta

© 2026 Harshit Gupta · New Delhi, India