Your code lives in Git. Your infrastructure changes live in a Slack message from three months ago. That asymmetry has a cost.






When you manage infrastructure across AWS, GCP, and Azure through GitHub Actions, one of the first things you notice is how differently code changes and infrastructure changes are treated. Code gets a pull request, a review, a commit history, and a rollback path. Infrastructure gets an SSH session, a few commands, and a prayer that whoever made the change remembered to mention it in a standup. The environments drift apart quietly. By the time you notice, the gap between what your manifests say and what is actually running is wide enough to cause real problems.

GitOps is the operational model that closes this gap. The core principle is straightforward: Git is the single source of truth for both application configuration and infrastructure state. A reconciliation agent running inside the cluster continuously compares what Git says the system should look like against what the system actually looks like. When they diverge, the agent corrects the drift. Every change goes through a commit. Every rollback is a revert. The audit trail is the commit history.

GitOps adoption has crossed 64% of enterprises as the primary delivery mechanism in 2026, with measurable improvements in infrastructure reliability and rollback velocity. The 2025 CNCF End User Survey found ArgoCD running in nearly 60% of Kubernetes clusters globally, with 97% of respondents using it in production. This is not a trend worth watching. It is the current default for serious Kubernetes operations.


How the reconciliation loop actually works

In a traditional push-based CI/CD pipeline, the pipeline connects to the cluster and applies changes. The cluster is a passive target. In a GitOps pull-based model, an agent running inside the cluster watches the Git repository. The cluster pulls its own desired state rather than having it pushed in. This distinction matters for several reasons.

First, drift detection is continuous. The agent does not just check state at deployment time. It checks continuously. If someone runs a kubectl apply directly against a production cluster, the agent detects the divergence and corrects it back to the state declared in Git. Manual changes do not stick. This is the self-healing property of GitOps, and it is what makes it significantly more reliable than pipelines that only apply state at deploy time.

Second, the cluster does not need external access to your CI system. The agent inside the cluster reaches out to Git, not the other way around. In a push model, your pipeline needs credentials to access the cluster. In a pull model, the cluster accesses Git with read-only credentials. The attack surface is smaller and the credential management is simpler.


ArgoCD and Flux: what each one actually is

ArgoCD

ArgoCD was built by Intuit to manage their own Kubernetes deployments at scale and open-sourced in 2018. It is a CNCF graduated project. The architecture is a centralised control plane: one ArgoCD instance manages applications across multiple clusters. It ships with a web UI that shows sync status, resource health, dependency trees, and a visual diff between desired and live state. For teams managing multiple environments across multiple cloud providers, that single-pane visibility has real operational value.

ArgoCD 3.3, released in early 2026, introduced PreDelete hooks, which solve a longstanding problem with stateful application deletion leaving orphaned resources. It also added Server-Side Apply as the default reconciliation mechanism, which improves drift detection significantly by letting the Kubernetes API server own field merging rather than handling it client-side. This eliminates the "conflict storms" that used to occur when ArgoCD and an HPA tried to manage the same resource fields simultaneously.

Installation is a single manifest apply into a dedicated namespace:

kubectl create namespace argocd
kubectl apply -n argocd \
  -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

An Application resource tells ArgoCD what to watch and where to apply it:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-service
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/your-repo.git
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

selfHeal: true is what activates the continuous drift correction. Without it, ArgoCD detects drift but waits for a manual sync. With it, the cluster corrects itself automatically when the live state diverges from Git.

Flux

Flux was created by Weaveworks, the company that coined the term GitOps in 2017. Weaveworks shut down in February 2024 after a failed acquisition. The core maintainers were picked up by ControlPlane, Microsoft Azure, and GitLab, and Flux remains an actively maintained CNCF graduated project with a 2026 roadmap. AWS backed ArgoCD at re:Invent 2025 with a managed EKS capability. Microsoft backed Flux. The cloud provider split is informative but not prescriptive: both tools run on any cloud.

Flux's architecture is fundamentally different from ArgoCD's. Instead of a centralised control plane, Flux is a set of modular Kubernetes controllers, each responsible for a specific function. The Source Controller fetches manifests from Git, Helm repositories, or OCI registries. The Kustomize Controller applies Kustomize overlays. The Helm Controller manages Helm releases. The Notification Controller routes alerts. You install only what you need.

Flux 2.8 introduced CEL-based health check expressions for HelmRelease objects and a mechanism to cancel ongoing health checks and immediately trigger a new reconciliation when a fix lands in Git. That last improvement addresses one of the more frustrating failure experiences in production: waiting for a health check timeout to expire before a fix gets applied.

Bootstrapping Flux into a cluster and connecting it to a Git repository:

flux bootstrap github \
  --owner=your-org \
  --repository=your-infra-repo \
  --branch=main \
  --path=clusters/production \
  --personal

This creates the Flux controllers in the cluster and pushes the initial manifests to the repository. From that point, a git push to the configured path is a deployment.

Which one to use

For teams new to GitOps, ArgoCD is the lower-friction starting point. The UI makes it significantly easier to understand what is happening during the initial weeks when the reconciliation model is still unfamiliar. The visual diff view, showing exactly what will change before a sync is applied, catches configuration mistakes that are easy to miss in a YAML review.

For teams already deep in Kubernetes-native tooling and comfortable with CLI-driven workflows, Flux is the more modular and lightweight option. Many mature platform teams run both: ArgoCD for application deployments, Flux for cluster infrastructure components. The tools are not mutually exclusive.


Repository structure: the decision that compounds over time

The repository layout for a GitOps setup determines how manageable the system stays as it scales. The pattern that works well across environments is a clear separation between application manifests and infrastructure components, with Kustomize overlays for environment-specific configuration:

gitops-repo/
├── apps/
│   ├── base/          # Shared base configs
│   └── overlays/
│       ├── staging/
│       └── production/
├── infrastructure/
│   ├── cert-manager/
│   ├── ingress-nginx/
│   └── monitoring/
└── argocd/
    ├── projects.yaml
    └── applicationsets/
        ├── apps.yaml
        └── infrastructure.yaml

ApplicationSets in ArgoCD let a single Application template generate deployments across multiple clusters or environments by templating from a list or from directory structure. For a multi-cloud setup spanning AWS, GCP, and Azure, this means managing one ApplicationSet definition rather than maintaining separate Application resources per cluster.


Secrets: the part that breaks GitOps setups most often

Plaintext credentials do not go in Git repositories. This is not a preference, it is a requirement. The 2025 Verizon Data Breach Investigations Report documented that ten million credentials leaked from GitHub in 2025 alone, with a significant portion tied to infrastructure configuration. A GitOps setup without proper secret management is a Git repository that is one accidental public push away from credential exposure.

Sealed Secrets is the simplest approach for teams starting out. A controller running in the cluster holds the private key. You encrypt secrets locally using the cluster's public key and commit the encrypted SealedSecret resource to Git. Only the cluster can decrypt it:

# Create and seal a secret
kubectl create secret generic db-credentials \
  --from-literal=password=your-password \
  --dry-run=client -o yaml | \
  kubeseal --format yaml > sealed-db-credentials.yaml

# Safe to commit
git add sealed-db-credentials.yaml && git commit -m "add db credentials"

For teams already using AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault, the External Secrets Operator is the better fit. It syncs secrets from external stores into Kubernetes-native Secret objects without storing sensitive values in Git at all. The GitOps repository contains the ExternalSecret resource definition, which references the secret path in the external store, not the value itself.

Mozilla SOPS with Age encryption is the third option, suited for teams that want file-level encryption for entire configuration files rather than individual secret values. SOPS encrypts specific fields in YAML while keeping the keys readable, so the structure of your configuration remains visible in version control without exposing the values.


Observability for the GitOps layer itself

A sync failure in ArgoCD does not automatically surface in your existing monitoring. A deployment can appear to succeed from the CI pipeline while the cluster silently ignores the change due to a webhook validation rejection, a resource quota limit, or a CRD version mismatch. Without instrumentation specifically for the GitOps layer, you are running blind on the deployment pipeline itself.

ArgoCD exposes Prometheus metrics from three endpoints: the application controller on port 8082 for sync status and reconciliation performance, the API server on port 8083 for request metrics, and the repo server on port 8084 for Git operation metrics. With the Prometheus Operator, ServiceMonitor resources connect these to your existing scrape configuration:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-metrics
  namespace: argocd
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-metrics
  endpoints:
    - port: metrics
      interval: 30s

The key metrics to alert on are argocd_app_info for sync and health status, argocd_app_sync_total for sync history and failures, and argocd_app_reconcile for reconciliation latency. A Grafana alert on applications with sync_status="OutOfSync" or health_status="Degraded" for more than five minutes is the minimum viable GitOps monitoring setup. Without it, the reconciliation loop can fail silently while your CI pipeline shows green.

Flux exposes equivalent metrics per controller. The gotk_reconcile_error_total counter increments on every reconciliation failure. Alerting on a non-zero rate of this metric over a five-minute window catches sync failures before they compound.


Where teams get into trouble during adoption

The teams that struggle with GitOps adoption try to migrate everything at once. The ones that succeed start with one low-risk service, get the reconciliation loop working, verify that a commit causes the cluster to update without manual steps, and then expand incrementally. The migration scope should be controlled at every stage: one service, verify, then the next.

The second common failure is scope creep on the repository structure before the team has enough experience with how the reconciliation model behaves in practice. A flat repository that works for three services starts causing problems at thirty. The overlay structure described above is worth setting up correctly from the beginning rather than retrofitting it later when there are dependencies between applications that need to be untangled.

The third is deferring the secret management decision. Teams that start with plaintext secrets in private repositories for convenience tend to keep them there until a security review forces the issue. The migration from plaintext to Sealed Secrets or External Secrets Operator mid-flight, with live services depending on the existing secrets, is a more disruptive operation than setting it up correctly from the start.

From an operational standpoint, the value compounds over time. The audit trail, the self-healing, the rollback path, and the consistent environment promotion through pull requests are individually useful. Together, they change the confidence level at which the team operates the infrastructure. The knowledge stops living in terminal history and starts living in the repository where anyone on the team can read it.

Author note

Ayesha Siddiqua & Mohan Gopi

Mohan is an Associate DevOps Engineer at Frigga Cloud Labs. He manages infrastructure across AWS, GCP, and Azure, deploys through GitHub Actions, and focuses on what happens after deployment: the feedback loops, the observability, and keeping infrastructure stable and improvable over time. 

I work with founding teams and CTOs through Frigga Cloud Labs, and the technical perspective in this blog belongs to Mohan, written from his hands-on experience inside these systems.

Let's connect on LinkedIn → Ayesha |  Mohan Gopi


Post a Comment

Previous Post Next Post