Operate · Rollback
Rollback strategy.
Every Axiom execution ships with a pre-verified rollback path and a measured time-to-restore. Rollback is not aspirational — it's tested before approval, never inferred at incident time.
The principle
01
Philosophy
Three rules that make rollback real instead of aspirational:
- Pre-flight capture — Axiom always captures sufficient state to restore before any modification begins
- Measured RTO — rollback time is measured in advance, not estimated at incident time
- Block if unverified — plan items without a verified rollback path are blocked at the governance gate
02 · Capture
Pre-flight state capture
Capture mechanism varies by resource type:
- EC2 — AMI snapshot or volume snapshot before any instance modification
- RDS — manual snapshot before any parameter group, instance-type, or replica change
- S3 — bucket configuration JSON export before ACL/lifecycle/encryption change
- IAM — policy version preserved by AWS automatically; Axiom captures the policy-version ID
- Security groups — full rule set captured as Terraform state before any modification
- Lambda — function configuration + alias version captured
Capture is audited in the AxiomAuditEvent.beforeState field — immutable and queryable.
03
The rollback plan
Every plan item has an attached rollbackPlan document containing:
- Exact restore commands (Terraform or CLI)
- Pre-flight state reference (snapshot ID, AMI ID, policy version)
- Measured RTO from prior rehearsals on similar resources
- Health verification criteria for confirming restore succeeded
- Escalation path if rollback itself fails
04 · Triggers
When rollback fires
Automatic — health verification failure
If post-apply health checks fail (ALB target unhealthy, SLO breach, CloudWatch alarm trip), rollback fires automatically without further approval.
Automatic — drift detected post-apply
If the change produced unexpected drift in dependent resources (cascading impact), rollback fires automatically.
Manual — operator-triggered
From the audit log or dashboard, an authorized user can trigger rollback. Same path as automatic — pre-flight state restored using the stored rollback plan.
Multi-phase
Rollback respects phase boundaries. If phase 3 of 4 fails, only phases 1–3 are rolled back. Phase 4 was never started.
05
Verification after rollback
After rollback completes, Axiom verifies the original state was actually restored:
- Resource configuration matches the captured
beforeState - Health checks return to baseline
- Dependent resources show no leftover drift
- Cost shift reverts (no orphaned charges)
If verification fails, the rollback is escalated — typically meaning rollback itself encountered an unexpected condition (rare, but possible). Manual investigation begins from the audit log.
06 · Honest limits
What rollback cannot do
- Restore deleted data that wasn't snapshotted by AWS (some configurations have no native snapshot mechanism)
- Undo write operations into databases that bypass RDS snapshots (manual schema migrations, for example)
- Undo a Lambda execution that already produced external side effects (emails sent, payments processed)
- Reverse a security group rule change that allowed brief intrusion (the change is reverted; the intrusion still happened — incident response is a separate process)
For these cases, Axiom blocks the plan item at the governance gate. The plan can't proceed without compensating safety measures.
Trust questions
What is rollback?
A pre-verified path to restore the exact state before an Axiom-executed change, with a measured RTO.
Why verified in advance?
Rollback verified at incident time is unreliable. Axiom proves it works before approval.
Is rollback safe?
Yes — it uses the same pre-flight state captured at plan time. No inference, no guessing.
What happens during rollback?
Pre-flight state is restored. Health checks verify restore succeeded. Audit log records both events.
Can rollback fail?
Rare but possible if AWS APIs themselves are degraded. Failure is escalated immediately to the audit log + operator.
What if rollback isn't possible?
Plan item is blocked at the governance gate. Compensating safety measures must be in place before the change can proceed.
Need a human?
Most flows are documented — but we'll help if anything is unclear.