Skip to content

SaaS Architecture Review Navigator (English)

Handling External System Failures

What Happens When the External System Fails?¶

Type: Structure
Category: Async
Audience: Engineers integrating with third-party systems, APIs, or partner platforms

🔍 What This Perspective Asks¶

What exactly happens if an external system goes down?
Is the failure detected, logged, and degraded gracefully?
Who gets hurt—and how much?

⚠️ Common Failures¶

Partner API goes down → internal queues fill silently
Retry storms → overload own systems
UX degrades (empty dashboards, broken buttons) with no user messaging
Errors silently swallowed because “async = eventually”

✅ Better Failure Handling¶

Explicit fallbacks or UI states when partner APIs fail
Timeout and retry policies per integration—not global
Queue isolation for high-risk dependencies
Alert on external system latency/spike—not just failure
Document UX impact and expected behavior per integration

🧠 Core Principle¶

Async doesn’t mean ignore the failure.
It means control how the failure manifests.

❓ FAQ¶

Q: We have retries. Isn’t that enough?
A: Not if the user or system can’t tell what happened.
Q: Should we surface all integration failures?
A: No. But you should choose who needs to know what—and when.