Skip to content

Are Event Delays and Retries Part of Your Design—or Just Runtime Surprises?

Type: Structure
Category: Async
Audience: Engineers building event-driven systems, workflows, or async pipelines


🔍 What This Perspective Covers

Async systems fail differently.
They don’t crash. They drift.

  • Retry loops that hide real failure
  • Invisible delays that break UX or violate SLAs
  • Side effects triggered multiple times without awareness

⚠️ Common Anti-Patterns

  • Retry forever on transient failures → permanent backlog
  • No delay compensation in UX → users spam reload
  • Events reordered → downstream consumers break silently
  • Delivery guarantee assumed, but never validated in test

✅ Stronger Event Flow Design

  • Define max retry windows and dead-letter paths
  • Design for retry “echoes”: side effects should be idempotent
  • Use correlation IDs to trace event chains
  • Monitor queue latency separately from success/error metrics
  • Document SLA for delay-tolerant vs delay-critical events

🧠 Core Principle

Async systems don’t fail loud.
They fail later and invisibly—unless you design them not to.


❓ FAQ

  • Q: Isn’t retry always better than fail?
    A: Only if the retry is harmless.

  • Q: How do we know which events are sensitive to delay?
    A: Define UX expectations first. Then encode that into the pipeline.