Why I prefer P99 to Apdex score for backend services

Iccha Sethi
3 min readNov 27, 2020

--

Let me start off this post by saying this is purely an opinion piece and a discussion starting point and would love to hear any feedback.

Apdex is a helpful measure in places where you are trying to convey general sentiment, especially for a browser based app, or express a combination of metrics. In fact, even the Apdex specification defines it as “..a numerical measure of user satisfaction with the performance of enterprise applications, intended to reflect the effectiveness of IT investments in contributing to business objectives”.

But I sometimes see folks using Apdex as a metric to measure and alert on for backend services instead of percentiles. As someone who has worked on the backend/server side of the stack for several years, I believe Apdex is a convoluted metric for a backend engineer on call to be paged on and this post dives into why.

The formula to calculate Apdex is:

Apdex = (Satisfied Reqs + Tolerated Reqs/2 + 0*Frustrated Reqs)/Total Reqs

And based on your Apdex score your application’s performance is assessed.

Reference: https://www.apdex.org/apdexfaq.html

In this formula it becomes critical to correctly identify what is the tolerating value and the frustrating value for response time. And the value of Tolerated Response time could range between T to 4T or however you have defined it for your system. Let us for the sake of this discussion assume that our Frustrated value is when the response time exceeds the service SLO.

Now, let’s go back to the Apdex formula. If we were to list the different ways we could arrive at an Excellent Score of 0.94 we have the following combinations:

If you were to set your alert to be paged on an apdex score of 0.94 it could be either you have no requests violating SLO (first row) or 6 requests(last row) violating SLO!

Instead if you had sev 2s configured based on specific value of P95/P99 > SLO and a lower threshold notification which does not page for when P95/P99 > Tolerated Time, it makes the customer impact more clear when on call. I would also caveat this by saying if you have invested in self recovering and reliant systems, you may not even want the lower threshold notifications as they may be considered noise.

To summarize, I think Apdex and Percentiles have their own pros and cons. I would rather use Percentiles for pages and alerts since it is a specific metric and can be actionable. I would use apdex when I want to understand user sentiment but would not be the first measure I add in for my backend services.

--

--

Iccha Sethi
Iccha Sethi

Written by Iccha Sethi

Interests include technology, building team culture, books and food. Engineering Leader.

Responses (1)