The only accurate description of your production system is the production system itself.



Ask most teams to show you exactly how their production environment is configured, and the honest answer is that they cannot, because the only complete record of it is the running system itself. The real state lives in the cluster and in the memories of the few people who made the changes, and none of that can be reviewed, audited, or rebuilt.

This is the problem GitOps was built to remove. As one description puts it, GitOps exists to end the familiar chaos where someone changed production by hand, the change was never reviewed, the rollback path was fuzzy, and the incident channel filled up with guesswork (fivenines, 2026). The idea has gone from a name coined in 2017 to standard practice quickly: in one large survey of organisations, 93 per cent said they plan to continue or expand their use of it (fivenines, 2026), and by some 2025 estimates the great majority of Kubernetes deployments are now run this way (scalr, 2026).

The core idea is simple. You describe what should be running in a Git repository, and an automated agent continuously makes the live system match it. This post is about why that one change, moving the source of truth out of the cluster and into a repository, fixes so many things at once, and what it does not fix. No code, just the model and the trade-offs.

A blueprint, and a builder who never stops checking

GitOps has two halves. The first is that the desired state of everything is written down declaratively in Git, the blueprint. Declarative means you state the end result you want rather than the steps to reach it, the way you tell a GPS your destination instead of giving it turn-by-turn directions, and the system works out how to get there (prepare.sh, 2026).

The second half is the builder: an automated agent running in the cluster that continuously pulls the desired state from Git and reconciles the live environment to match it (madrigan, 2026). The blueprint says what should be true, and the builder keeps making it true.

The nuance is that the agent is only one half, and teams that reduce GitOps to we use a particular tool have it backwards. The discipline is that Git is the only authoritative path, with changes flowing through it as reviewed commits, not the controller that happens to apply them (fivenines, 2026).

The problem it replaces: a system only it can describe

The old way is to change the live system directly, by console clicks or manual commands. The trouble is that this leaves no source of truth outside the system itself. Manual changes to the live environment cause configuration drift, where the running system quietly diverges from whatever anyone intended (prepare.sh, 2026). Done repeatedly, you end up with a system that nobody can reproduce and no record of who changed what, when, or why.

This is not really a Kubernetes problem, it is a record-keeping problem, even though Kubernetes is where it bites hardest now that around 85 per cent of organisations run it in production (DevOps.com, 2024). Any system changed by hand, a cloud console, a server, a set of network rules, drifts the same way, which is exactly why GitOps principles are increasingly applied well beyond Kubernetes (scalr, 2026).

The nuance is that the pain is invisible until it is acute. Everything works fine right up to the day you need to recreate an environment, or explain an outage, or prove to an auditor what was running, and discover the only place that answer exists is inside the system you are trying to ask about.

Every change becomes a reviewed, reversible commit

When the only way to change production is to change the repository, every change automatically inherits everything Git already gives software. Because each change is a commit, you get a chronological audit trail of who changed what, when, and why, which is a significant advantage for security and compliance (prepare.sh, 2026; Wiz, 2025).

You also get review before a change lands, a one-step undo by reverting the commit, and consistency across dev, test, and production for free, because they all read from the same source (madrigan, 2026). None of these are new tools, they are just Git, applied to operations.

The nuance is that this only holds if the team genuinely respects Git as the single path. The most common way GitOps fails is leaving a shadow route open: the moment engineers can still hotfix live resources by hand, Git stops being the source of truth and drift creeps straight back in (fivenines, 2026).

The reconciliation loop constantly checks the running system against what is declared in Git and brings it back into line whenever it drifts, like a guard that never sleeps (Wiz, 2025). A manual change does not last. It is corrected on the next sync.

The cluster heals itself, and recovery becomes trivial

Two consequences fall straight out of that reconciliation loop. The first is self-healing: the agent constantly compares the live state to the Git state and reconciles any difference, so a crashed component or a corrupted config is pulled back to the declared state automatically (prepare.sh, 2026). The second is recovery: because the entire system is described in the repository, rebuilding it is largely a matter of pointing a fresh cluster at that repo and letting the agent reconstruct it.

There is a security dividend too. Because the pull model has the cluster reach out to Git rather than exposing the cluster to outside systems pushing into it, the attack surface is smaller than a pipeline that needs direct production access (madrigan, 2026; Wiz, 2025).

The nuance is that self-healing has a sharp edge. If the declared state in Git is wrong, the agent will faithfully and repeatedly enforce the wrong thing, and it will quietly fight your manual attempts to fix it. The reconciliation loop is only ever as good as what you have told it to want.

What GitOps does not fix, and what it costs

GitOps is a discipline about where truth lives, not a guarantee of good outcomes. It does not stop you shipping a bad change, it only makes that change visible and reversible. And it has real costs: a steep learning curve for teams new to declarative infrastructure, a fragmented tooling landscape, and above all a genuine cultural shift. Teams that treat GitOps as a process change rather than just a set of tools are the ones that avoid the common mistakes (shadecoder, 2025).

There is also a pointed new responsibility. Putting the source of truth in Git makes credential hygiene critical, because anything sensitive committed to a repository is now part of your operational record. This is not hypothetical, given how routinely secrets end up in the wrong place.

If Git becomes the source of truth, then Git's security becomes your security, and that bar is higher than most teams meet today. One 2025 report found 61 per cent of organisations had secrets sitting in public repositories (Wiz, 2025). The model is only as trustworthy as the repository it trusts.

The nuance is that none of these are reasons to avoid GitOps, only reasons to adopt it deliberately. A bad commit still reaches production; the difference is that you can see exactly what it was, who approved it, and revert it in one step, which is a far better place to be than guessing in an incident channel.

The part worth sitting with

So come back to the question at the start: if someone asked you to reproduce your production environment exactly, somewhere else, today, could you, without relying on memory or a long afternoon of archaeology? If the answer is no, it is because the truth about your system lives inside the system, where it cannot be reviewed before it changes or restored after it breaks. GitOps is not really about Kubernetes, or any particular controller. It is the decision to move that truth into a repository, so that changing production means proposing a change someone reviews, the system keeps itself in line with what was agreed, and rebuilding from nothing is a matter of pointing at a repo rather than reconstructing from people's recollections. It will not save you from a bad decision. It will make sure the bad decision was written down, approved, and is one revert away from gone, which is a very different place to stand than in front of a system only it can describe.

Author note

I am Mohan Gopi, an Associate DevOps Engineer at Frigga Cloud Labs. I work across AWS, GCP, and Azure, with GitHub Actions as the deployment backbone for everything I ship. The pattern I keep seeing is teams that can deploy in minutes but could not tell you, with confidence, exactly what is running in production right now or how it got that way, because the answer lives in the cluster instead of in a repo. Moving that source of truth into Git was the single change that did the most for my own peace of mind, not because the tooling is clever, but because it turned someone changed something into a commit with a name and a diff on it. I run everything through pull requests now, even the boring changes, and I let the reconciliation loop be the thing that keeps production honest while I sleep. If you describe it, review it, and let an agent enforce it, most of the 3am mysteries simply stop happening. Happy to compare repo structures and rollout setups on LinkedIn → Mohan Gopi.

Post a Comment

Previous Post Next Post