Runbook — Disaster Recovery
Disaster recovery restores the platform in a separate region or cluster after a regional or cluster-level incident. This is broader than application rollback.
Preconditions
- DR region or cluster is provisioned.
- Backup artifacts are reachable from the DR environment.
- DNS, certificate, and customer connectivity changes are approved.
- Incident commander owns the cutover decision.
Steps
- Provision or validate the DR namespace and infrastructure dependencies.
- Restore PostgreSQL from the selected recovery point.
- Deploy Finnest Power with ingress disabled.
- Run internal smoke tests and tenant sanity checks.
- Repoint DNS or gateway routing after approval.
- Monitor traffic, errors, consent flows, and payment readiness.
Verification
- Recovery point objective and recovery time objective are recorded.
- Core APIs, consent lifecycle, and payment initiation readiness checks pass.
- Observability backend receives logs/traces from the DR deployment.
- A follow-up task captures reconciliation work for NATS/outbox events after the recovery point.