What Happens When the External System Fails?¶
Type: Structure
Category: Async
Audience: Engineers integrating with third-party systems, APIs, or partner platforms
🔍 What This Perspective Asks¶
- What exactly happens if an external system goes down?
- Is the failure detected, logged, and degraded gracefully?
- Who gets hurt—and how much?
⚠️ Common Failures¶
- Partner API goes down → internal queues fill silently
- Retry storms → overload own systems
- UX degrades (empty dashboards, broken buttons) with no user messaging
- Errors silently swallowed because “async = eventually”
âś… Better Failure Handling¶
- Explicit fallbacks or UI states when partner APIs fail
- Timeout and retry policies per integration—not global
- Queue isolation for high-risk dependencies
- Alert on external system latency/spike—not just failure
- Document UX impact and expected behavior per integration
đź§ Core Principle¶
Async doesn’t mean ignore the failure.
It means control how the failure manifests.
âť“ FAQ¶
-
Q: We have retries. Isn’t that enough?
A: Not if the user or system can’t tell what happened. -
Q: Should we surface all integration failures?
A: No. But you should choose who needs to know what—and when.