If your rollback rebuilds from the old commit, it is not a rollback. It is a new, untested release shipped mid-incident.



Ask a team how they roll back and many will say they re-run the pipeline on the previous commit. That is not a rollback. Even with identical source code, a rebuild can produce a different artefact, because dependencies, base images, and build tools resolve differently over time, so what you ship is a new, unverified build wearing an old version number.

The reliable alternative is an old rule: build once, deploy many. You build each version exactly once, package it with its dependencies into an immutable artefact, store it with a unique identifier, and then promote that same artefact through every environment without modification (Home Office, 2026; MinimumCD). Rollback then becomes trivial, because the previous artefacts are sitting unchanged in the repository, ready to be redeployed (MinimumCD), and the key move is to treat the image digest, not the source branch, as the thing you deploy (Microsoft, 2026).

This post is the practical version: build once and capture the digest, deploy by digest rather than a mutable tag, keep old versions so they still exist, and roll back by re-pointing to the previous known-good artefact. Real configuration, and the one thing immutable artefacts still cannot roll back for you.

Build once, and deploy that exact digest forever after

Build the artefact a single time and capture its content-addressed digest, then have every downstream step, and every future rollback, reference that digest rather than a rebuild. A rebuild is not deterministic: even identical source can yield different outputs, so rebuilding per environment quietly breaks parity and makes rollback a gamble (SwayamOps, 2026). The digest is a hash that uniquely identifies the exact image, and the whole point of build-once is that the artefact never changes after creation (MOSS, 2025).

# Build ONCE. Everything downstream deploys this exact digest.
- id: build
  uses: docker/build-push-action@v6
  with:
    push: true
    tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

- name: Record the immutable digest
  env:
    DIGEST: ${{ steps.build.outputs.digest }}
  run: echo "IMAGE=ghcr.io/${{ github.repository }}@${DIGEST}" >> "$GITHUB_ENV"

The build step pushes once and outputs a digest, and you carry that digest forward as the thing you deploy, so staging, production, and any later rollback all point at the identical bytes.

The trade-off is that you give up the convenience of just rebuild it. In return you get an artefact you can trace back to a specific commit and redeploy with confidence, which is the trade every reliable pipeline makes (Home Office, 2026).

Redeploying the previous git commit does not give you the previous artefact. Even with identical source, a rebuild can resolve different dependencies and base images (SwayamOps, 2026), so what you ship in the middle of an incident is a new, unverified build, which is the opposite of what a rollback is for.

Deploy the digest, never a mutable tag

Reference artefacts by their immutable digest, not by a moving tag like latest or even a version tag that can be repointed. A mutable tag means the thing running can change under you, and it means telemetry cannot reliably tell you which build caused an error. The common failure is deploys using a tag like latest where behaviour varies because the manifest used a mutable tag, and the fix is to use digests and immutable tags (NoOps School, 2026); the guidance is explicit that you should avoid mutable tags in production and promote by digest (MOSS, 2025).

# Deploy the exact artefact by digest, and record why (for rollout history)
kubectl set image deployment/web-api \
  api=ghcr.io/org/myapp@sha256:9f2c... -n production

kubectl annotate deployment/web-api -n production --overwrite \
  kubernetes.io/change-cause="Deploy build 1487 (sha256:9f2c...)"

You set the image to a full digest and record a change-cause, so the running version is unambiguous and the rollout history reads like a ledger of exactly which artefact ran, and when.

The trade-off is that digests are unreadable, so keep a human-friendly version number alongside them for reference, but always deploy and record the digest itself (MOSS, 2025).

Keep the old versions, or there is nothing to roll back to

The previous artefact only helps if it still exists. That means retention in the registry: it must keep old images, and their tags must be immutable so the version you built stays exactly as built. The value of build-once shows up precisely here, because the previous artefacts remain unchanged in the repository, ready to redeploy (MinimumCD), as long as your retention policy has not deleted them.

# Tags can never be overwritten: the old version stays exactly as built
aws ecr put-image-tag-mutability \
  --repository-name myapp --image-tag-mutability IMMUTABLE

# Keep the last 30 images so a rollback target still exists
aws ecr put-lifecycle-policy --repository-name myapp \
  --lifecycle-policy-text '{"rules":[{"rulePriority":1,
    "selection":{"tagStatus":"any","countType":"imageCountMoreThan",
    "countNumber":30},"action":{"type":"expire"}}]}'

Immutable tags stop a version from being quietly overwritten, and the lifecycle policy keeps a fixed number of recent images so there is always something to return to (NoOps School, 2026).

The trade-off is storage against safety. Keeping many images costs money, and keeping too few means your rollback target may already be gone, so size the retention to your deploy cadence, enough to cover every version you might realistically need to return to.

Roll back by re-pointing, not rebuilding

With immutable artefacts in place, rollback is just switching back to the previous one, which is why a proper rollback avoids rebuilding or redeploying artefacts at all (Microsoft, 2026). In Kubernetes, each revision already stores the exact pod template and its immutable image, so rolling back is a single command against history you already have (Kubernetes, 2026).

# See the history, with the change-cause you recorded at deploy time
kubectl rollout history deployment/web-api -n production

# Roll back to the immediately previous, known-good revision
kubectl rollout undo deployment/web-api -n production

# Or, if the previous one was also bad, target a specific revision
kubectl rollout undo deployment/web-api -n production --to-revision=42
kubectl rollout status deployment/web-api -n production

A rollback is an incident-response primitive for restoring a known-good revision fast, with the aim of minimising time to recover, and it works best when your images are immutable so the reversion is predictable (plural, 2026).

The trade-off is that this is not automatic. Kubernetes does not roll back a failed deployment for you; it marks the rollout failed and leaves the broken version in place until you act, so the trigger has to be a deliberate, signal-driven decision or a higher-level tool that watches your metrics (plural, 2026).

Retain enough revisions, and know what a rollback will not fix

The cluster side has its own retention limit. A Deployment's revisionHistoryLimit controls how many old ReplicaSets are kept, defaulting to 10, after which the oldest are garbage-collected and you can no longer roll back to them (jorijn, 2026).

# Keep enough revisions that a rollback target always exists
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  revisionHistoryLimit: 20      # old ReplicaSets kept for rollback
  progressDeadlineSeconds: 600  # mark the rollout failed if it stalls
  template:
    spec:
      containers:
        - name: api
          image: ghcr.io/org/myapp@sha256:9f2c...   # pinned by digest

The harder truth is that a rollback restores less than people assume. Switching back the pod template does not restore a Service rewrite, a ConfigMap edit, or a database migration that already ran, so the real question in an incident is always back to what state (jorijn, 2026). The database is the sharpest case: if a migration was backward-incompatible, rolling back the code will not save you, which is exactly why schema changes should be backward-compatible and gated behind flags (NoOps School, 2026). And if you run GitOps, an imperative rollback drifts the cluster from Git, so commit the revert instead of only patching the live state (plural, 2026).

A rollback switches back to the previous pod template and its immutable image, but it does not restore a Service rewrite, a ConfigMap edit, or a migration that already ran (jorijn, 2026). Immutable artefacts roll back your code. They do not roll back your data.

The part worth sitting with

So the next time you plan a rollback, ask the only question that matters: does the exact version you want to return to still exist, byte for byte, ready to redeploy without a rebuild? If the honest answer is that you would re-run the pipeline on the old commit, then you do not have a rollback, you have a fresh build you are hoping behaves like the old one, right at the moment you can least afford a surprise. The fix is not clever, it is disciplined: build each version once, deploy it by its digest, keep the old ones in the registry and the cluster, and roll back by pointing at the previous known-good artefact instead of remaking it. Do that, and a rollback stops being a tense, uncertain rebuild and becomes what it should be, a fast switch back to something you already know works. Just stay honest about what it cannot undo. Your code goes back in seconds. Your database does not, and no immutable artefact will change that.

Author note

I am Mohan Gopi, an Associate DevOps Engineer at Frigga Cloud Labs. I work across AWS, GCP, and Azure, with GitHub Actions as the deployment backbone for everything I ship. The pattern I keep seeing is teams whose rollback plan is really a rebuild plan, and they only find out the difference during an incident, when the artefact they get back is subtly not the one they lost. I build once and deploy the digest now, everywhere, because the first time a rollback failed on me with an image that no longer existed, I understood that a rollback you have not kept the bytes for is not a rollback at all. My rule is that the deploy artefact is a digest, never a branch, and that every environment and every rollback points at bytes I have already tested. It turns the scariest button in the pipeline into a boring one. The only thing I stay honest about is the database, because that is the one part this does not save. If you want to compare how you pin and retain artefacts, I am on LinkedIn → Mohan Gopi.

Post a Comment

Previous Post Next Post