Skip to content

Is API Latency Acceptable—and Understood?

Type: DeepDive
Category: Performance
Audience: Engineers designing APIs, monitoring behavior, or debugging UX issues


🔍 What’s Actually Being Asked

Not “is it fast?”
But:

  • Is the latency acceptable under current conditions?
  • Is it predictable under load?
  • Do you know what contributes to the delay?

⚠️ Typical Issues

  • High latency only under load—but no alerts fire
  • Spikes caused by background tasks or queue congestion
  • DB roundtrips and N+1 queries hidden in controller logic
  • Cold caches after deploys or config changes
  • API clients adding retry loops, compounding the slowness

✅ Healthier Latency Design

  • Define SLOs and plot real distribution, not averages
  • Include latency budget breakdown in API design docs
  • Use timeout budgeting to balance retries vs user experience
  • Log latency contributors per request (DB, cache, external API)

🧠 Design Framing

Latency is not a number.
It’s a conversation between client pain and backend design.


❓ FAQ

  • Q: Our p95 is fine. Is that good enough?
    A: Not if your tail spikes hurt the user more than your average helps.

  • Q: Can we just throw more infra at it?
    A: You can. Until you can’t afford to.