Tying Engineering Metrics to Business Metrics
Most engineering organizations I’ve worked in or led have tracked some form of engineering metrics. These range from simple metrics like uptime and incident count to more complex frameworks like DORA. As an engineering leader, you’ve probably been asked, either by someone within or outside of engineering: Why do these metrics matter? or How do they align with our business goals?
This post is aimed at demystifying some of this. We will cover:
- Key Business Metrics
- Lagging and Leading engineering metrics and how they connect to the key business metrics
While this isn’t an exhaustive list of engineering metrics, the goal is to provide a practical framework that you can adapt to your context.
Here is a TLDR of it, and let’s break it down along the way:
Key Business Metrics
Below are some key business metrics that most business use:
ARR (Annual Recurring Revenue): The total recurring revenue a company expects to receive annually from its customers. (Wall Street Prep)
NRR (Net Revenue Retention): A metric that measures the percentage of recurring revenue retained from existing customers over a specific period, accounting for expansions, contractions, and churn. (Planhat)
GRR (Gross Revenue Retention): The percentage of recurring revenue retained from existing customers over a specific period, excluding any revenue gained from expansions or upsells. (ChurnZero)
CAC (Customer Acquisition Cost): The total cost incurred by a company to acquire a new customer, including marketing and sales expenses. (Cast)
These metrics are lagging indicators, sometimes as lagging as 12 months, where a customer churns at the end of their yearly contract impacting the GRR.
Let us look at some potential Intermediate Outcomes which may impact these key business metrics.
Intermediate Outcomes
High GRR and NRR reflect loyal, satisfied customers who find the product valuable, easy to use (user experience), and reliable (system reliability). These customers are more likely to expand their usage, purchase additional features, and remain long-term advocates for your platform.
Acquiring new customers is generally more expensive than retaining existing ones. Studies indicate that attracting a new customer can cost up to five times more than retaining an existing one. Additionally, the probability of selling to an existing customer ranges between 60–70%, whereas the probability of selling to a new prospect is only 5–20%. These statistics underscore the financial benefits of focusing on customer retention strategies.
To grow the business via ARR and reduce CAC simultaneously, we must prioritize shipping product features quickly (feature velocity) without compromising the factors that sustain GRR and NRR.
Engineering Metrics (Lagging)
There are a number of engineering metrics which are lagging, but in much lesser magnitude of time than GRR/NRR/CAC/ARR. Metrics like uptime, time to detect and recover incidents, performance, support tickets, bugs, and team velocity can be measured over shorter timeframes.
As an engineering leader I have found that they’re most insightful when reviewed monthly and analyzed for trends over 3–6 months. These can be earlier indicators of unhappy customers and can enable the teams to take quick action, before the customer becomes a churn risk. Some examples include:
- If there is an uptick in support tickets, growing disproportionately to customer base, or team is unable to keep up with support ticket SLAs, it is an indication of potentially higher number of bugs in the product, or an unintuitive user experience, leading to unhappy customers.
- Increasing number of incidents, or high TTD, TTR along with decrease in Uptime means there are periods of time the product is unavailable or not working as expected again impacting customer trust.
- Slow web app performance means it takes longer to get tasks done and unideal user experience.
- Team velocity impacts ability to ship customer requested features.
Engineering Metrics (Leading)
Sometimes even months might be too late to come back and fix something. Luckily we have a number of best practices, and a set of metrics related to these best practices when done right have a high correlation to the lagging engineering indicators. These metrics though imperfect in their own ways, generally are a decent real time indicator of potential impact to lagging indicators. Some of these leading indicators include: Test coverage, PR size, Feature flag usage, deployment frequency, lead time for change, etc. Some of these can be reviewed on a per Pull request basis, or even daily. Ideally individual teams, or engineers feel a high sense of ownership for these.
Summary
Tying these all together — short lead time for change, means PRs get quickly into production. This is not only amazing for team and product velocity because we are shipping changes quickly and get to validate them quicker in production, but also allow us to decrease our Time to recover during incidents by applying a fix quickly. With lower impacting incidents, means less unhappy customers which help us maintain our GRR. Similarly with quicker time for features to get in production, means our product has higher value quicker, thereby making it easier to gain new customers and increase our ARR.
I hope this post clarifies the connection between engineering and business metrics. The next time someone asks why code coverage or deployment frequency matters to the business, you’ll have the answer — and a framework to back it up! 😀