ADR-014-F: Infrastructure Reuse Strategy — Co-Deploy Finnest on Existing AgenticAI-app Hosts¶
Status: Accepted Date: 2026-04-17 Decision Makers: Gautham Chellappa Depends on: ADR-001-F (Elixir), ADR-010-F (Strangler Fig migration), ADR-0011 (inherited — Elixir migration) Supersedes: — (clarifies the deployment intent sketched in ADR-0011)
Context¶
AgenticAI-app has four mature AWS ap-southeast-2 EC2 environments (local, integration, staging, production) plus a CI bastion. Each host runs Docker + Kamal, Caddy + Let's Encrypt, MySQL and Redis as Kamal accessories, with Bitwarden Secrets Manager for secret retrieval. Combined operational investment is significant — terraform modules, deploy scripts, GitHub Actions runners on bastion, SSL automation, per-env .env patterns.
ADR-0011 sketched a transition intent: "repurpose integration + staging for AgenticAI-finnest, production stays Laravel until cutover." This ADR formalises the concrete reuse strategy across all four envs and commits to a co-deploy pattern (Laravel + Finnest on the same hosts during transition), with a deliberate plan to manage resource contention, security separation, and cutover risk.
Three host strategies were considered:
- Option A — co-deploy all envs (this ADR) — both apps on each existing host during transition, Laravel decommissioned post-cutover by stopping its services
- Option B — new EC2 instances for every Finnest env (3 extra instances during transition)
- Option C — hybrid (reuse integration + staging; new production for Finnest)
Option A was selected for cost-minimisation and operational consistency (one host pattern to manage across envs). The concerns Option A raises — resource contention, security scoping, cutover risk — are explicitly addressed in §Consequences and §Mitigations below.
Decision¶
Finnest deploys alongside Laravel AgenticAI-app on the existing four EC2 hosts during the transition window. Laravel services remain active throughout go-live and migration phases; Finnest services are added incrementally. Post-cutover (per ADR-010-F Strangler Fig, Migration Phase X decommission), Laravel services are stopped on each host and its accessories (MySQL, Redis, Horizon) retired. Finnest then has the hosts to itself.
Reuse inventory (what's shared)¶
| Component | Reuse |
|---|---|
4 × EC2 hosts in ap-southeast-2 |
Same hosts; both apps deploy via Kamal |
| AWS accounts (separate per env) | Same accounts; new IAM roles for Finnest |
| CI bastion with 3 GitHub Actions runners | Same runners serve both ci.yml (Laravel) and ci-finnest.yml (Elixir) |
| Bitwarden Secrets Manager | Same org; new finnest project alongside agenticai project |
bws CLI + scripts/deploy.sh pattern |
Ported pattern: scripts/deploy-finnest.sh with identical shape |
Terraform modules (app-host, ci-bastion) |
Reused as-is; parametrised per-env with Finnest additions |
| Caddy + Let's Encrypt | Same Caddy instance; new virtual hosts per Finnest subdomain |
| Docker runtime | Same; Finnest uses Debian-slim Elixir images |
| Kamal orchestration | Same; new config/deploy.finnest.*.yml files |
Route 53 zone agentic-ai.au |
Same zone; new Finnest subdomains |
| Sentry + Grafana + Prometheus stack | Same self-hosted observability pattern per ADR-010 Part 10 |
New provisioning (what's added per host)¶
| Component | Added |
|---|---|
| PostgreSQL 17 | New Kamal accessory per host (alongside existing MySQL) |
| Finnest app container (Elixir release) | New service |
| Oban-owned background processing | No separate accessory (Postgres-backed) |
S3 buckets: finnest-{integration,staging,production}-storage |
New buckets in same AWS accounts |
Caddy config blocks for WebSocket upgrade (/live/websocket, /socket/*) |
New config |
Route 53 records: integration-finnest.agentic-ai.au, staging-finnest.agentic-ai.au, app-finnest.agentic-ai.au |
New A/CNAME records |
| Postgres read replica on staging + production | New (post-Phase 1) |
What's NOT reused¶
| Component | Replacement |
|---|---|
| MySQL | Postgres (AR-13). MySQL remains active on each host for Laravel + Finnest V2Repo read-only access during Strangler Fig; retired when Laravel is retired. |
| Redis + Horizon | Not needed. Oban is Postgres-backed (AR-11); ETS covers cache (B02). Redis retired when Laravel is retired. |
| Horizon dashboard | Oban Web |
/up healthcheck path |
/health (shallow) + /ready (deep) per architecture Part 9 |
.env.docker per-env file pattern |
config/runtime.exs env branching (Elixir convention) |
php artisan schedule:work |
Oban cron |
Domain naming during transition¶
Finnest subdomains in the existing agentic-ai.au zone:
| Env | Laravel URL (unchanged) | Finnest URL |
|---|---|---|
| Integration | integration.agentic-ai.au |
integration-finnest.agentic-ai.au |
| Staging | staging.agentic-ai.au |
staging-finnest.agentic-ai.au |
| Production | app.agentic-ai.au |
app-finnest.agentic-ai.au |
At cutover (per ADR-010-F Migration Phase X): app.agentic-ai.au CNAME-swaps to Finnest host; Laravel services stopped but image retained on host for 30-day rollback window; finally, zone-level migration to hexis.au (commercial name per brainstorm-11 Topic 4) is executed separately when commercial launch is ready.
Capacity planning¶
Current host utilisation (Laravel only):
| Host | Instance | Approx. utilisation |
|---|---|---|
| Integration (t3.medium) | 2 vCPU / 4 GB | ~35% CPU, 55% RAM |
| Staging (t3.medium) | 2 vCPU / 4 GB | ~40% CPU, 60% RAM |
| Production (t3.large) | 2 vCPU / 8 GB | ~55% CPU, 70% RAM |
| CI Bastion (t3.large) | 2 vCPU / 8 GB | ~40% CPU, 50% RAM |
Projected co-deploy utilisation (Laravel + Finnest, Phase 1):
| Host | Target RAM budget |
|---|---|
| Integration + Staging (t3.medium, 4 GB) | Laravel ~1.5 GB + Postgres ~1 GB + Finnest BEAM ~1 GB + OS ~500 MB = ~4 GB — TIGHT. Upgrade trigger: sustained >85% RAM or OOM events. Upgrade target: t3.large (8 GB) before Phase 1 end. |
| Production (t3.large, 8 GB) | Laravel ~2 GB + Postgres ~3 GB + Finnest BEAM ~2 GB + OS ~500 MB = ~7.5 GB. Some headroom; upgrade to t3.xlarge if Phase 2 load exceeds. |
Upgrade budget: ~\(50/month per env if t3.medium → t3.large, ~\)100/month per env if t3.large → t3.xlarge. Absorbed into Phase 1 infra costs; re-evaluated at each phase boundary.
Production promotion discipline¶
Added 2026-04-18 via Sprint 4 solutioning gate check (D3). Same pattern as AgenticAI-app production deploy.
Production deploys (both Laravel and Finnest) require human gating. Auto-on-main applies only to integration + staging. Production promotion enforces four controls:
- Manual trigger only — production deploys fire from a dedicated
workflow_dispatchworkflow (.github/workflows/deploy-finnest-production.yml), never on push or merge. The workflow accepts a requiredconfirmationinput (e.g. matching the target host name) to prevent accidental triggers. - GitHub environment protection with 2 reviewers — the
production-finnestGitHub environment (and the equivalentproductionenvironment for Laravel) requires two distinct reviewers with write access to approve the job before the workflow proceeds past the protection gate. Reviewer list is maintained in GitHub org settings; rotations tracked in the ops runbook. - Pre-deploy smoke gate — the workflow fails-fast if the staging smoke suite has not run clean within the prior 24 hours.
scripts/deploy-finnest-smoke.sh stagingmust pass; the timestamp is checked againstgit log --first-parent staging...main. - Post-deploy smoke gate — the workflow runs
scripts/deploy-finnest-smoke.sh productionimmediately after thekamal deploystep; a failing post-deploy smoke auto-triggersscripts/deploy-finnest.sh --rollback productionand raises an alert in the ops channel. No silent regressions.
Rationale: production is live customer workload for Laravel and will be live customer workload for Finnest post-cutover. The cost of an accidental bad deploy — data loss, Laravel regression on the shared host, customer-facing downtime — is asymmetrically higher than the ~5 min of human latency that two-reviewer gating adds. Matches the AgenticAI-app production policy adopted at Laravel launch.
Alternatives Considered¶
| Alternative | Rejected because |
|---|---|
| Option B — new EC2 instances for every Finnest env | +3 instances during transition (~$150/mo); operational overhead of managing two parallel infrastructure footprints; Laravel decommission becomes an instance termination exercise per env rather than a service-level stop |
| Option C — hybrid (reuse integration+staging, new production) | Reasonable but still 1 extra instance; operational inconsistency (production differs from lower envs in transition strategy); minor cost saving not worth the extra complexity |
| Option D — reuse all + cutover in place on same host at cutover day | Highest cutover risk (if Finnest boot fails on cutover day, Laravel is already stopped on same host); rollback requires re-deploying Laravel from image in emergency conditions |
New Route 53 zone hexis.au from Phase 0 |
Commercial Hexis branding isn't ready at Phase 0 start; purchasing + SSL provisioning + zone migration adds unnecessary Phase 0 work. Zone migration is an independent late-phase task |
| Different secrets manager per app (Laravel on Bitwarden, Finnest on SSM) | Bitwarden carry-forward decided in architecture Part 8; one tool simpler operationally |
Consequences¶
Positive:
- Zero new EC2 instances during transition — direct cost saving ~$70-150/month vs Options B/C
- One set of hosts, one set of SSL configs, one set of Terraform modules, one CI bastion to know
- Laravel decommission = stop services on each host (no instance termination / data migration)
- Postgres co-located with MySQL lets
V2RepoMyXQL queries traverse loopback (low latency) during Strangler Fig - Team learns one host topology; on-call procedures apply to both apps
- Caddy routing enables immediate side-by-side URL testing without DNS cutover
Negative:
- Resource contention risk on t3.medium hosts (integration + staging). Peak load could cause either app to impact the other. Mitigation: upgrade to t3.large before any real traffic on Finnest beyond smoke tests. Monitor via CloudWatch CPUUtilization + MemoryUtilization.
- Security blast radius is larger — a compromise of the host compromises both apps' data. Mitigation: separate DB users with minimum privileges (Finnest Postgres user cannot read MySQL; Finnest
V2RepoMyXQL connection uses a read-only user with explicitSELECTgrants only); secrets scoped per project in Bitwarden. - Operational coordination during transition — a Kamal deploy of one app requires care not to disturb the other's accessories. Mitigation: Kamal's per-app namespacing (
agenticai-app-*containers vsfinnest-*containers) is already isolation-friendly; documented playbook for each host's expected services. - Cutover rollback window — if Finnest prod goes wrong post-cutover, rollback requires re-enabling Laravel on the same host. Mitigation: Laravel image + DB state retained for 30 days post-cutover; scripted re-enable (
scripts/restore-laravel.sh) prepared in advance and tested in staging; cutover checklist requires successful staging cutover rehearsal first. - Dependency drift risk — upgrades to shared components (Docker, Caddy, OS packages) now affect both apps. Mitigation: lower-env upgrades always run a day ahead of production; dependency upgrades tracked in a shared changelog per host.
Mitigations summary¶
| Risk | Mitigation |
|---|---|
| t3.medium RAM exhaustion | Monitor; upgrade to t3.large before Phase 1 end |
| Finnest Postgres accidentally writes to Laravel MySQL | Separate DB users; V2Repo connection string uses a role with SELECT grants only (DA-01); architecture test asserts no INSERT/UPDATE/DELETE statements against V2Repo |
| Cross-app secret leakage via Bitwarden | Separate agenticai and finnest Bitwarden projects; machine account tokens per project; neither app's deploy script reads the other's tokens |
| Caddy config breaking WebSocket for LiveView | Staged Caddy config change with rollback; synthetic WebSocket test from CI before merge |
| Production cutover failure | Parallel run pattern: on cutover day, DNS switches app.agentic-ai.au to Finnest but Laravel services remain running for 72h; Laravel only stopped after 72h of clean Finnest operation |
| Kamal deploy of one app disturbs the other | Per-app Docker network namespacing; accessory containers prefixed per app (agenticai-app-mysql vs finnest-postgres); shutdown-order documented in runbooks |
| IRAP deployment bleeds into commercial hosts | IRAP is entirely separate (ADR-007-F). None of this ADR applies to IRAP. IRAP hosts are provisioned independently in a dedicated VPC with no peering to these commercial hosts. |
Review triggers¶
Re-evaluate this ADR if any of:
- Finnest production load causes sustained >75% RAM on production host even after instance upgrade
- Security incident traces to cross-app state leakage
- Cutover timeline materially changes such that long-term co-deploy (>12 months) becomes expected
- Hexis commercial launch requires production environment rename before Finnest is ready for cutover
Implementation Plan¶
Phase 0 (Weeks 1–4):
- Add
finnestproject to Bitwarden Secrets Manager; generate machine account token - Extend Terraform
app-hostmodule to acceptfinnest_enabled: trueflag; provision Postgres Kamal accessory when set - Apply Terraform to integration host first — provisions Postgres + Finnest S3 bucket + Route 53
integration-finnest.agentic-ai.au - Create
config/deploy.finnest.integration.ymlKamal config - Create
scripts/deploy-finnest.sh(mirror ofscripts/deploy.shpattern) - Create
.github/workflows/ci-finnest.ymlusing existing bastion runners - Add Caddy virtual host + WebSocket upgrade config
- First Finnest deploy to integration — smoke test
- Repeat for staging and production hosts
Phase 1+ (per ADR-010-F):
- Strangler Fig migration phases run; Finnest takes over v2 capabilities domain-by-domain
- Capacity monitoring runs continuously; upgrade triggers acted on within 1 week
- Cutover rehearsal in staging before production cutover
Migration Phase X (post-cutover):
- Laravel services stopped on each host; images retained 30 days
- MySQL and Redis Kamal accessories stopped; volumes retained 30 days
- Post-30-day: MySQL + Redis volumes purged; Laravel images removed; host renamed in Terraform (optional)
- Zone migration
agentic-ai.au→hexis.auas separate commercial workstream
Relationship to Guardrails¶
Enforces / is enforced by: ADR-001-F (Elixir), ADR-010-F (Strangler Fig phases), ADR-007-F (IRAP is separate — this ADR excluded from IRAP scope), DA-01 (V2Repo read-only), IN-01 through IN-04 (Docker + Elixir releases), IN-08 (health check), SE-11/12 (CI security scanning), SE-15 (Postgres TLS).