Most engineering leaders cannot tell you, with numbers, whether their organisation is actually healthy. They can quote headcount, story points, and how busy everyone looks, none of which answer the question.
There is a better answer, and it has been tested for over a decade across tens of thousands of professionals. DORA's research identifies four metrics that measure software delivery, and it has shown repeatedly that teams scoring well on them are more likely to meet their goals for profitability, time to market, and customer satisfaction (IBM, 2026). In fact, teams in the top delivery tier are roughly twice as likely to meet or exceed their organisational goals (Hyperdrive Agile, 2026). Four numbers, and they read the health of the whole organisation, not just the engineering team.
The four are deployment frequency, lead time for changes, change failure rate, and failed deployment recovery time. Two measure speed, two measure stability. This post is about what they actually tell you, why they are a read on organisational health rather than a vanity dashboard, and the traps in using them. No setup guide, just what the numbers mean.
Two numbers for speed, two for stability, and you need all four
The four split cleanly into two halves. Deployment frequency and lead time for changes measure throughput, which is how fast you ship. Change failure rate and recovery time measure stability, which is how safely you ship. DORA built them this way on purpose, to capture both dimensions at once rather than letting a team look good on one while quietly failing the other (IBM, 2026).
The bar for the top tier is high. Elite teams deploy on demand, get a change to production in under a day, keep their change failure rate at 5 per cent or lower, and recover from a failed deployment in under an hour, and only about 19 per cent of teams manage all four (Kodus, 2024). That last point matters: this is a demanding standard, not a participation badge.
The trap is optimising one number in isolation. Push deployment frequency on its own and your change failure rate spikes. Chase a spotless failure rate on its own and you end up deploying once a quarter. The four are a balanced set precisely so that gaming one shows up as damage in another. Reading any single number without the other three will mislead you every time.
Speed and stability are not a tradeoff, which is the whole point
The most common assumption about these metrics is also the most wrong: that going faster means breaking more. DORA's research has repeatedly shown the opposite. The metrics are correlated for most teams, top performers do well on all of them, and low performers do poorly on all of them (DORA, 2026).
Elite teams deploy more frequently and have better reliability at the same time, because both come from the same underlying engineering practices rather than from a trade between them (neubird, 2026). Small batches, automated testing, and fast feedback improve speed and stability together. You are not choosing between the two. You are building the capabilities that lift all four.
The nuance is that this works at the level of practices, not effort. You do not get both by telling people to try harder, and you do not get the stability gains by gaming the speed number. A team that splits one release into ten to inflate its deployment frequency has changed the metric without changing anything real, and the stability numbers will say so.
They measure outcomes, not activity, which is why they are hard to game
The reason these four beat the usual metrics is that they measure outcomes. Did the change reach production, did it work, and how fast did you recover, rather than activity like lines of code, hours worked, or story points closed. DORA gives leaders evidence-based signals instead of subjective measures and vanity indicators such as lines written or hours logged (IBM, 2026). Outcomes are far harder to fake than activity.
And the outcomes they predict are the ones leaders care about. Elite performers on these metrics have been associated with markedly higher market capitalisation growth and roughly 2.5 times faster time to market (Glukhov, 2025). That is the link that lets an engineering leader justify investment in tooling and developer experience in language a board understands.
Hard to game is not impossible to game. You can move a number without improving anything real, usually by redefining what counts as a failure. DORA itself warns against blending metrics across different teams and stresses that context matters, because the right comparison is a team against its own past, not against another team in a different situation (DORA, 2026). Use them to ask better questions, not to rank people.
They are a read on the people, not just the pipeline
The deepest reason to treat these four as a measure of organisational health is that they track with how the people are doing, not only how the software ships. Healthy delivery numbers and healthy teams come from the same conditions. DORA's research shows these metrics predict better organisational performance and better well-being for the people on the team (DORA, 2026).
The link to people is concrete. The 2024 report found that teams with stable priorities faced 40 per cent less burnout than teams whose priorities kept shifting (OpsLevel, 2024), and the 2025 report expanded its definition of performance to include the human and systemic conditions that sustain it (Axify, 2025).
The nuance is that the numbers are a symptom, not the cause. A team with bad metrics rarely has a discipline problem. It has a system problem: too much work in progress, a slow pipeline, unclear ownership, priorities that change every week. Improving the number means fixing the system around the people, not pushing the people harder, which usually makes every number worse.
What the newest research says, including about AI
The framework has not stood still. DORA has shifted from the original four keys to a five-metric model, adding a reliability dimension to round out the picture (DORA, 2026). The 2024 report also introduced rework rate, the unplanned work needed to fix user-facing bugs, as a complement to change failure rate, because teams with high failure rates spend much of their time on rework rather than new value (Kodus, 2024).
The most current question is what AI does to these numbers, and the answer is sobering. AI speeds up low-level coding tasks, but it has not yet produced meaningful gains in lead time or change failure rate, which points to an uncomfortable truth: writing code was never the real bottleneck (RedMonk, 2024). Teams with mature pipelines see AI improve both speed and stability, while teams without them see more rework and more incidents (Axify, 2025).
The nuance is that this is not an argument against AI. It is a reminder of where the constraint actually lives, in review, testing, integration, and deployment, not in typing. AI raises the volume of code arriving at your pipeline. The four numbers are what tell you whether the rest of your system can absorb it or is about to buckle under it.
The part worth sitting with
So the next time someone asks whether your engineering organisation is healthy, notice what you reach for. Headcount, story points, how late people stayed. None of those answer the question, and all of them are easy to look good at while the organisation quietly rots. Four numbers do answer it, with a decade of evidence and tens of thousands of teams behind them. They are hard to fake, they balance speed against stability so you cannot win on one by losing the other, and they track with the things leaders actually care about: profit, time to market, and how burnt out the people are. You do not need a bigger dashboard. You need these four, read honestly, as a question rather than a target. The organisations that treat them that way already know whether they are healthy. The ones still counting lines of code are guessing.
Author note
I am Mohan Gopi, an Associate DevOps Engineer at Frigga Cloud Labs, working across AWS, GCP, and Azure with GitHub Actions as my deployment backbone. I wrote this because these four numbers are the most misunderstood metrics in engineering. The pattern I keep seeing is teams either ignoring them entirely and flying on vibes, or weaponising them to rank individuals, which is the surest way to make them useless. They are a read on a system, not a scoreboard for people. Used honestly, they will tell you where your delivery actually hurts, long before a missed quarter does. Used as a stick, they tell you nothing except how good your team has become at gaming a number. Let us connect on LinkedIn → Mohan Gopi.
