Celery + Redis is the standard Python async task stack for a reason. Key patterns for production reliability: task idempotency (always design tasks to be safely retried), exponential backoff with jitter, task routing to dedicated queues, result backend for tracking, and Flower for real-time monitoring. This post covers all five with production examples.
Why Async Tasks Are Non-Negotiable at Scale
At CertifyMe, credential issuance involves: generating a PDF, anchoring a hash on blockchain, sending an email, and posting a webhook to the issuing organization's systems. Combined latency: 4-12 seconds depending on blockchain network congestion. Without async tasks, every credential issuance is a 12-second HTTP request — that's not a web app, that's a loading screen.
The moment any operation takes more than ~500ms, it belongs in a background task queue. The HTTP handler returns immediately with a task ID. The client polls or receives a webhook when the work is done. This pattern makes your API fast, makes retries automatic, and makes failures observable.
Basic Celery Setup with Redis
# celery_app.py
from celery import Celery
def make_celery(app_name: str) -> Celery:
return Celery(
app_name,
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/1', # separate DB for results
include=['tasks.issuance', 'tasks.notifications', 'tasks.webhooks']
)
celery = make_celery('certifyme')
# Configuration
celery.conf.update(
task_serializer='json',
result_serializer='json',
accept_content=['json'],
timezone='UTC',
enable_utc=True,
task_track_started=True, # STARTED state tracking
task_acks_late=True, # ACK only after task completes (safer)
worker_prefetch_multiplier=1, # One task at a time per worker (fair)
)
The broker queue and result backend have very different access patterns. Keeping them in separate Redis DBs (or separate Redis instances in production) prevents result backend reads from competing with queue operations, and makes it easy to flush results independently of the queue.
Pattern 1: Idempotent Tasks
The most critical production pattern. Any task that can be retried (all of them, eventually) must be idempotent — running it twice must produce the same result as running it once:
from celery import shared_task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
@shared_task(
bind=True,
max_retries=3,
default_retry_delay=60,
name='tasks.issuance.issue_credential'
)
def issue_credential(self, credential_id: str) -> dict:
logger.info(f"Processing credential {credential_id}", extra={"task_id": self.request.id})
# Idempotency check — if already issued, return early
credential = Credential.get(credential_id)
if credential.status == 'issued':
logger.info(f"Credential {credential_id} already issued, skipping")
return {"status": "already_issued", "credential_id": credential_id}
try:
pdf_url = generate_pdf(credential_id)
blockchain_tx = anchor_on_blockchain(credential_id, pdf_url)
send_issuance_email(credential_id, pdf_url)
credential.mark_issued(pdf_url=pdf_url, blockchain_tx=blockchain_tx)
return {"status": "issued", "credential_id": credential_id}
except BlockchainTimeoutError as exc:
# Retryable error — exponential backoff
raise self.retry(exc=exc, countdown=2 ** self.request.retries * 30)
except InvalidCredentialDataError as exc:
# Non-retryable error — fail immediately, don't retry
logger.error(f"Non-retryable failure for {credential_id}: {exc}")
credential.mark_failed(reason=str(exc))
raise
Pattern 2: Task Routing and Priority Queues
All tasks should not compete equally for workers. High-priority tasks (user-triggered actions) should not wait behind low-priority tasks (scheduled reports). Use dedicated queues with dedicated worker pools:
# Task routing configuration
celery.conf.task_routes = {
'tasks.issuance.*': {'queue': 'high_priority'},
'tasks.notifications.*': {'queue': 'default'},
'tasks.reports.*': {'queue': 'low_priority'},
'tasks.cleanup.*': {'queue': 'low_priority'},
}
# Start workers with dedicated queues
# High priority: more workers
# celery -A celery_app worker -Q high_priority --concurrency=8 --hostname=hp@%h
# Low priority: fewer workers
# celery -A celery_app worker -Q low_priority,default --concurrency=2 --hostname=lp@%h
Pattern 3: Task State Tracking via the API
For long-running tasks, expose the task state to the frontend so users get real-time progress feedback:
from flask import jsonify
from celery.result import AsyncResult
@app.route('/credentials/issue', methods=['POST'])
def trigger_issuance():
credential_id = request.json['credential_id']
task = issue_credential.delay(credential_id)
return jsonify({"task_id": task.id, "status": "queued"}), 202
@app.route('/tasks//status')
def task_status(task_id):
result = AsyncResult(task_id, app=celery)
response = {
"task_id": task_id,
"state": result.state, # PENDING / STARTED / SUCCESS / FAILURE / RETRY
}
if result.state == 'SUCCESS':
response['result'] = result.result
elif result.state == 'FAILURE':
response['error'] = str(result.info)
elif result.state == 'RETRY':
response['retry_count'] = result.info.get('retries', 0)
return jsonify(response)
Pattern 4: Periodic Tasks with Celery Beat
from celery.schedules import crontab
celery.conf.beat_schedule = {
'cleanup-expired-sessions': {
'task': 'tasks.cleanup.purge_expired_sessions',
'schedule': crontab(minute=0, hour=2), # 2 AM daily
},
'generate-daily-report': {
'task': 'tasks.reports.generate_daily_issuance_report',
'schedule': crontab(minute=0, hour=6), # 6 AM daily
},
'retry-stuck-credentials': {
'task': 'tasks.issuance.retry_stuck',
'schedule': crontab(minute='*/15'), # every 15 minutes
},
}
Celery Beat is a scheduler — running multiple instances will trigger duplicate task executions. In a multi-server deployment, use a distributed lock or run Beat on exactly one instance. This is one of the most common Celery production mistakes.
Key Takeaways
- Any operation over ~500ms belongs in a background task — keep HTTP handlers fast
- All tasks must be idempotent — always check "already done" before doing work
- Distinguish retryable errors (backoff and retry) from non-retryable (fail fast)
- Dedicated queues with dedicated worker pools prevent priority inversion
- Expose task state via API endpoints for real-time user feedback on async operations
- Only one Celery Beat instance per deployment — enforce with distributed locking