Digest-keyed migrations with Argo CD (and a migrations-operator fork)

This post is about one narrow problem: running database migrations once per real application image in a GitOps flow, even when development uses mutable tags like latest. The tag in git is a request. The digest on the node is the answer.

migrations-operator (Noah Kantrowitz / coderanger) implements that idea: watch pods that match a Migrator selector, clone their container spec into a Job, and add a waiter init container so new pods do not go Ready until migrations for that rollout have succeeded.

The bug you get without digest awareness

Imagine CI keeps pushing to registry/app:latest. In git, Argo still shows image: registry/app:latest — the string never changes.

Upstream behaviour keys “did we already migrate for this version?” off concepts tied to the declared image string and job identity. After a successful migration for :latest, the next CI build can overwrite the tag in the registry with a new manifest digest while YAML stays identical. Git diff shows nothing new, and the tag string is unchanged.

The failure mode: new bytes, same tag, and the control plane thinks migrations already ran. The fix is to compare manifest digests — what actually landed — not the sticker next to image: in the manifest.

After the fork, the controller reads status.containerStatuses[].imageID from the template pod and normalises it to a digest. It stores that value in Migrator.status.lastSuccessfulMigration. It also annotates the migration Job with the expected digest. If the digest moves, the old Job is stale and is removed so a new migration can run. “Same tag, new digest” becomes a first-class case.

What changed in the fork versus upstream

Upstream already had the right architecture (watch pods, Job from template, webhook injector, Argo Rollouts in the owner chain). The gap was version identity: mutable tags need to be compared using the resolved image ID, not only the tag string on the Job spec.

In this fork, reconciliation waits until imageID is present. It then validates that the resolved image ID contains a real digest. It compares that to the last successful run recorded on the Migrator. If an existing Job’s digest does not match, that Job is replaced.

That is the behavioural change: idempotency keyed by digest, plus stale-job cleanup when a mutable tag now points to different bytes.

Why not “just use a Helm or Argo `pre-upgrade` hook”

Hooks work until the migration needs the same Secrets, ConfigMaps, and service identity as the app, or until chart ordering makes you hoist half the release into the hook. The operator’s approach is to clone the live pod template instead of re-building that world inside a hook.

Argo CD’s job is unchanged: git remains desired state; sync applies Deployment or Rollout; the cluster does the rest. Argo does not need SQL awareness — only workloads and Migrator CRs.

Rollout sequence (compressed but ordered)

Argo sync updates the workload spec (image field still says whatever git says — e.g. app:latest).
The scheduler assigns new pods; kubelet pulls until the image manifest is available.
Containers start; the kubelet records the resolved image in status.containerStatuses[].imageID (often repo/app@sha256:…).
The operator’s selected “template” pod now has a stable digest in status even if the spec still shows a tag.

Until step 3 completes, there is nothing honest to key migrations on; the fork requeues until imageID exists.

Dev versus prod

In production I still prefer immutable tags or image@sha256:… in git — audits, rollbacks, and fewer arguments.

In development, latest is a trade-off, not a sin: CI pushes often; you may not want a manifest edit per build. Git then no longer uniquely identifies the artefact; the cluster does, via digest. The operator closes the loop: new push → new digest on the node → one migration round for that digest — without editing YAML for every pipeline run.

Argo Rollouts

If you use Argo Rollouts, upstream already walks argoproj.io Rollout owners when resolving pod templates. The same digest logic applies: the Job tracks what landed in pod status, not only what the Rollout object claims in spec.

Mental model

Layer	Responsibility
Argo CD	Reconcile git → cluster; policies optional
Workload	Roll pods; kubelet writes `imageID`
Migrator	New digest → one migration Job; record success on CRD
Waiter init	Block app pods until migrations for that digest succeed

Backups, idempotent SQL, and review of destructive changes are still on you; this only aligns orchestration with how the node identifies images.

A tiny end-to-end example

Deployment — labels must match what the Migrator selects; image uses latest on purpose for dev:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: apps
spec:
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: app
          image: registry.example/app:latest

Migrator — same labels; command is whatever your stack uses to migrate:

apiVersion: migrations.coderanger.net/v1beta1
kind: Migrator
metadata:
  name: myapp-migrations
  namespace: apps
spec:
  selector:
    matchLabels:
      app: myapp
  command:
    - python
    - manage.py
    - migrate

After the first successful migration job, status on the Migrator holds a digest string, not a tag:

status:
  lastSuccessfulMigration: sha256:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Next CI push overwrites registry.example/app:latest with a new build. Git is unchanged; Argo may show no diff. New pods get a different imageID. The operator sees a new digest, replaces the old migration Job, and runs migrations again. lastSuccessfulMigration updates to the new sha256:….

Appendix: install with Naphtha charts

If you want to try this fork rather than only the pattern, it is packaged in naphtha-charts (charts/migrations-operator). Defaults use the published controller image and install into migrations-system with CRDs and webhooks:

helm repo add naphtha https://charts.naphtha.dev
helm repo update
helm install migrations-operator naphtha/migrations-operator

If the canonical URL is not reachable yet, the same index is available from GitHub raw:

helm repo add naphtha 'https://raw.githubusercontent.com/AndreaTrendafilov/naphtha-charts/main/helm-index'
helm repo update
helm install migrations-operator naphtha/migrations-operator

Override image.repository / image.tag in values if you build the controller yourself. Wire Argo CD applications as you already do.

Checklist

Confirm workload pods expose the container you migrate against; label selectors on the Migrator match.
After a rollout, kubectl get pod -o jsonpath='{.items[0].status.containerStatuses[0].imageID}' — you should see an imageID that includes a digest before trusting migration status.
Prefer immutable tags or digests in git for production; use latest only where you accept git not being the single source of artefact truth.
Reach for hooks when they are truly simpler; reach for the operator when hooks force you to duplicate pod context or ordering becomes brittle.
Treat Migrator.status.lastSuccessfulMigration as a digest, not as a tag label.

If this fits your cluster, you keep GitOps without treating the tag string in git as the release identity. A mutable tag is not a version; the digest is.