Skip to content

STORY-F-021: agents schema migration + real persistence (sessions, messages, budget, audit)

Epic: Agent Infrastructure Priority: Must Have Story Points: 2 Status: Not Started Assigned To: Unassigned Created: 2026-04-18 Sprint: 4


User Story

As a developer building on top of the Phase 0 agent stack, I want the agents Postgres schema materialised with real tables for sessions, messages, budget limits, and tool audit — and F-012/F-013/F-014's ETS / stub persistence layers swapped for real reads + writes, so that F-017 Agent Chat survives BEAM restarts, budgets persist across deploys, and the AW-14 audit trail is durable from day one of Phase 1.


Description

Background

F-012 (finnest_agents Orchestrator + BudgetGuard), F-013 (AiProvider + PromptCache), and F-014 (MCP ToolRegistry + sample tool) all deliberately shipped with stub / ETS persistence because the agents Postgres schema wasn't yet scheduled. The Sprint 3 solutioning gate check (2026-04-18) surfaced this as tech-debt item TD1 — without a real schema landing in Phase 0, F-017's Agent Chat LiveView would ship against volatile memory and the AW-14 audit trail wouldn't survive a pod restart.

This story lands the schema and swaps the stubs. Sprint 4 has 2pt headroom (13/15 committed before this story); F-021 fits without reducing the rest of Sprint 4's scope.

Scope

In scope:

  • Migration (apps/finnest_core/priv/repo/migrations/20260508000001_create_agents_schema.exs) — reversible (CQ-11). Creates agents schema + 4 tables:
  • agents.sessions — id UUID, org_id UUID NOT NULL, user_id UUID NOT NULL, started_at TIMESTAMPTZ, ended_at TIMESTAMPTZ, last_active_at TIMESTAMPTZ, metadata JSONB DEFAULT '{}', inserted_at, updated_at, deleted_at. Indexes on (org_id, started_at DESC) and (user_id, last_active_at DESC).
  • agents.messages — id UUID, session_id UUID FK → agents.sessions(id) ON DELETE CASCADE, org_id UUID NOT NULL, role VARCHAR (user | assistant | system | tool), content TEXT, tool_calls JSONB DEFAULT '[]', tokens_in INT, tokens_out INT, cache_stats JSONB DEFAULT '{}', cost_aud DECIMAL(12,4), correlation_id UUID, inserted_at. Indexes on (session_id, inserted_at) and (org_id, inserted_at DESC).
  • agents.budget_limits — id UUID, org_id UUID UNIQUE NOT NULL, daily_aud DECIMAL(12,4), weekly_aud DECIMAL(12,4), monthly_aud DECIMAL(12,4), daily_spent_aud DECIMAL(12,4), weekly_spent_aud DECIMAL(12,4), monthly_spent_aud DECIMAL(12,4), daily_reset_at TIMESTAMPTZ, weekly_reset_at TIMESTAMPTZ, monthly_reset_at TIMESTAMPTZ, inserted_at, updated_at. One row per org.
  • agents.tool_audit — id UUID, session_id UUID, org_id UUID NOT NULL, tool_name VARCHAR, category VARCHAR, input_hash BYTEA (BLAKE2b), output_hash BYTEA, duration_ms INT, cost_aud DECIMAL(12,4), correlation_id UUID, causation_id UUID, model VARCHAR, tokens_in INT, tokens_out INT, cache_read_tokens INT, cache_write_tokens INT, error_reason VARCHAR, inserted_at. Indexes on (org_id, inserted_at DESC), (tool_name, inserted_at DESC), (correlation_id).
  • Ecto schemasFinnestAgents.Session, FinnestAgents.Message, FinnestAgents.BudgetLimit, FinnestAgents.ToolAuditEntry. All inherit the F-008 tenant-scoping hook automatically via :org_id in __schema__(:fields) — no explicit allowlisting.
  • Context modulesFinnestAgents.Sessions, FinnestAgents.Messages, FinnestAgents.Budgets, FinnestAgents.Audit. Standard CRUD + tenant-scoped queries. Budgets.record_spend/3 becomes an atomic UPDATE on agents.budget_limits with daily_spent_aud = daily_spent_aud + amount semantics (no read-modify-write race).
  • Swap stubs in F-012/F-013/F-014 code:
  • FinnestAgents.Session GenServer — replace in-memory history flush stub with FinnestAgents.Messages.persist_async/1 (Oban job queued via FinnestCore.ObanWorker from F-008; wraps with with_tenant/2).
  • FinnestAgents.BudgetGuard — replace ETS-only accumulator with ETS hot path + periodic flush to agents.budget_limits (every 60s or on clean shutdown). On boot, rehydrate ETS from agents.budget_limits.
  • FinnestAgents.MCP.ToolRegistry — replace ETS-stub tool_audit with real INSERT to agents.tool_audit (still via cast/async so hot path stays non-blocking — acceptable drop-on-crash because the message itself ends up in agents.messages).
  • FinnestAgents.AiProvider.AnthropicDirect — cost/cache stats now persist through agents.tool_audit rather than PromptCache GenServer memory.
  • PromptCache rework — stays as an in-memory hit-rate aggregator (OP-03 telemetry); stops pretending to be durable. Persistence moved to agents.tool_audit. Removes ~60 LOC of "will persist later" comments.
  • Boot-time rehydrateFinnestAgents.Supervisor.init/1 order becomes: Repo → Vault → BudgetGuard (rehydrates from DB) → ToolRegistry → Orchestrator → AgentSupervisor. PromptCache stays late (no durable state).
  • Reset jobs — Oban cron for daily/weekly/monthly budget resets (FinnestAgents.Workers.ResetBudgets). Idempotent: checks reset_at timestamp; no-ops if already reset this window.

Out of scope:

  • Full retention policy (90-day commercial / 7-year IRAP purge) — F-020 ships the cron; this story just creates the tables.
  • Event-store tie-in (writing agent.* domain events) — F-016 + F-017 wire this.
  • agents.memories L2/L3 memory tables — agents.md §Memory System defers L2/L3 to post-Phase-0; not blocking any Phase 0 story.
  • Retroactive backfill — no production data exists yet; the stub window (F-012 → F-021) is pre-launch, nothing to migrate.

Technical Notes

  • Pattern match with F-007: same tenant-rooted schema pattern — org_id NOT NULL, deleted_at where soft-delete makes sense (Sessions), hard-delete where it doesn't (Messages cascade from Sessions; BudgetLimit single-row-per-org).
  • Hash-not-plaintext for audit (Commandment #24 / agents.md §Audit Trail): input_hash + output_hash are BLAKE2b(16) of the serialised input/output maps. Plaintext lives briefly in agents.messages.content but is redacted/aged-off per retention policy.
  • DECIMAL precision for cost_aud: DA-15 mandates DECIMAL(12,4) for money; ex_money library already in the dep set from F-007 context; re-use here.
  • BudgetGuard ETS + flush pattern preserves the non-blocking hot path from F-012 — the DB is the authoritative store on reboot, ETS is the per-request cache. On crash-recovery the worst case is 60s of in-flight spend not yet persisted, which is acceptable (the LLM call itself logged to agents.tool_audit on completion, so cost reconciliation is still auditable).
  • Keeps F-008 tenant hook honest — every table has org_id in __schema__(:fields), so FinnestCore.Repo.prepare_query/3 injects WHERE org_id = ? automatically. No skip_org_scope: true calls required for any of this story's context functions. A stray cross-tenant query raises RuntimeError loud and early.
  • Module namespace (symmetric with F-016/F-017/F-019): modules live under FinnestAgents.* / FinnestCore.* (flat). This story was authored post-Sprint-3-D4-resolution and is mostly already consistent; one slip (Finnest.Agents.Workers.ResetBudgetsFinnestAgents.Workers.ResetBudgets) fixed 2026-04-18 alongside the Sprint 4 gate-check resolution PR.

Dependencies

  • Blocked by: STORY-F-012 (finnest_agents scaffold — schema modules reference FinnestAgents.Session etc.), STORY-F-007 (Cloak vault + schema patterns), STORY-F-008 (tenant enforcement — story relies on prepare_query injecting org_id).
  • Blocks: STORY-F-017 (Agent Chat LiveView — requires durable agents.messages), STORY-F-020 (production smoke — asserts restart-survival of session state).

Acceptance Criteria

  • Migration creates agents schema + 4 tables with field shapes per §Scope; mix ecto.migrate + mix ecto.rollback both succeed (CQ-11)
  • Ecto schemas compile; Repo.insert/1 + Repo.all/1 + Repo.delete/1 round-trip on each of Session, Message, BudgetLimit, ToolAuditEntry
  • Every context function passes through FinnestCore.Repo.prepare_query/3 with no skip_org_scope: true bypass (grep-guarded by an architecture test)
  • FinnestAgents.Messages.persist_async/1 queues an Oban job via FinnestCore.ObanWorker; job executes in tenant scope and inserts the row; retry-on-failure with max_attempts: 3
  • FinnestAgents.Budgets.record_spend/3 performs an atomic UPDATE (single round-trip, no read-modify-write)
  • BudgetGuard rehydrates from agents.budget_limits on boot; in-memory ETS reflects DB state after startup
  • BudgetGuard flushes in-memory spend to DB every 60s (configurable) + on clean supervisor shutdown
  • FinnestAgents.Workers.ResetBudgets idempotent — running twice in the same window produces the same final state (QJ-01)
  • ToolRegistry.invoke/3 writes a row to agents.tool_audit for every successful call; error path writes a row with error_reason populated
  • AiProvider.AnthropicDirect.generate/3 writes prompt + response hashes (not plaintext) + token counts + cost to agents.tool_audit
  • PromptCache GenServer no longer claims persistence in its module doc; responsibilities narrowed to in-memory hit-rate aggregation
  • F-012/F-013/F-014 stub comments (@TODO: persistence when schema lands) are all removed
  • Architecture test: every agents.* schema has org_id; tenant enforcement fires; no allowlist entries added (Sessions/Messages/BudgetLimit/ToolAudit are not exceptions — they all carry tenant)
  • mix format, credo --strict, dialyzer, mix boundary all green

Testing Requirements

  • Unit: each context module (CRUD + error paths)
  • Unit: BudgetGuard boot rehydrate; flush; reset window; atomic record_spend
  • Unit: ResetBudgets worker idempotency
  • Unit: Messages.persist_async enqueues correct Oban job with tenant args
  • Integration: full session lifecycle — start → 3 user messages → MCP tool calls → end → simulate crash mid-session → rehydrate new GenServer → assert prior messages retrievable
  • Integration: cross-tenant attempt — tenant A creates a session; tenant B tries to load it; expect RuntimeError from prepare_query / empty result set
  • Property: random message sequences always produce consistent token + cost totals between session and tool_audit sum

References