Member-only story

Availability Is All You Need

Shawn Shi
4 min readDec 13, 2024

--

Availability is a critical component in all system designs. It is a measure of how often a service is running properly and is accessible to the users.

How to measure availability

A common formula used to measure availability is: Availability (%) = 100 x Uptime / (Uptime + Downtime).

Where:

  1. Uptime: The amount of time the service is available.
  2. Downtime: The amount of time the service is, not available...

But, how do we define “available”? For different services, being available might mean something different.

  • For a messaging service, it may mean the latency is below a certain threshold. The messaging service may still be running, but it may take a loooog time to send a message, this is unavailable.
  • For a REST API, it may mean HTTP request success rate. The API may still be processing requests and returning responses, but lots of requests failing means the service is having availability issue.

For the rest of the article, we will use a REST API as an example and use HTTP request success rate to quantify availability at request level. Request level is more granular than just amount of time, and provides better insights in what actually is wrong with the system. For example…

--

--

Shawn Shi
Shawn Shi

Written by Shawn Shi

Senior Software Engineer at Microsoft. Ex-Machine Learning Engineer. When I am not building applications, I am playing with my kids or outside rock climbing!

No responses yet