What Happens When the External System Fails?¶
Type: Structure
Category: Async
Audience: Engineers integrating with third-party systems, APIs, or partner platforms
đ What This Perspective Asks¶
- What exactly happens if an external system goes down?
- Is the failure detected, logged, and degraded gracefully?
- Who gets hurtâand how much?
â ď¸ Common Failures¶
- Partner API goes down â internal queues fill silently
- Retry storms â overload own systems
- UX degrades (empty dashboards, broken buttons) with no user messaging
- Errors silently swallowed because âasync = eventuallyâ
â Better Failure Handling¶
- Explicit fallbacks or UI states when partner APIs fail
- Timeout and retry policies per integrationânot global
- Queue isolation for high-risk dependencies
- Alert on external system latency/spikeânot just failure
- Document UX impact and expected behavior per integration
đ§ Core Principle¶
Async doesnât mean ignore the failure.
It means control how the failure manifests.
â FAQ¶
-
Q: We have retries. Isnât that enough?
A: Not if the user or system canât tell what happened. -
Q: Should we surface all integration failures?
A: No. But you should choose who needs to know whatâand when.