Finnest Agents Architecture¶

Date: 2026-04-16 Status: Draft Scope: AI agent infrastructure — how finnest_agents works, how the three tiers interact with 19 domains via MCP, how cost and governance are enforced.

Related documents: architecture.md (main), ../brainstorms/brainstorm-03-ai-agent-design.md, ../10-GUARDRAILS.md §11 (AI-01–AI-09).

Purpose¶

Finnest is AI-native — agents aren't a feature, they're the architecture's central nervous system. This document details the infrastructure the main architecture summarises. If the main doc describes what agents are, this doc describes how they work.

The Three Tiers¶

Each tier solves a different problem; each has different durability, cost, and oversight characteristics (B03 Insight 1).

Tier 1 — Conversational Agents (GenServer, short-lived)¶

Shape: One GenServer process per user session. Holds conversation state in process memory. Dies when session ends.

Durability: Process lifetime only. If the user disconnects for >10 min, session memory is flushed to agents.sessions / agents.messages in Postgres and the GenServer terminates. On reconnect, a fresh GenServer rehydrates context from L2 memory.

Cost profile: Variable — pattern-match tier is $0; LLM-classified intent ~$0.01; rich generation ~$0.02–0.05.

Types:

Agent	Role
User Chat Agent	General-purpose; routes to domain specialists via MCP. The user-facing entrypoint that responds to Cmd+K and the mobile chat.
Reach Agent	Outbound + inbound SMS / voice / chat / email orchestration for workers, candidates, and clients. Decides templates, extracts intent from replies, hands off to humans when confidence drops.
Admin Assistant	Natural-language queries over dashboards ("show me this week's unfilled shifts"). Answers through MCP read-only tools.

Lifecycle:

User opens Cmd+K or mobile agent
  └─ finnest_agents.Orchestrator.start_session/2
      └─ AgentSupervisor.start_child({UserChatAgent, %{session_id, org_id, user_id}})
          └─ GenServer started, restart: :temporary
              └─ session state kept in process dictionary
              └─ messages persisted to agents.messages async (for audit)

User sends message
  └─ send(session_pid, {:user_message, text, correlation_id})
      └─ handle_cast: two-tier classify → route to MCP tool(s)
      └─ stream response back via Phoenix Channel (agent:<session_id> topic)

User idle 10+ min OR disconnect
  └─ :timeout triggers graceful shutdown
      └─ persist remaining state to agents.sessions
      └─ GenServer exits

User returns
  └─ Orchestrator rehydrates from agents.sessions → new GenServer

Tier 2 — Workflow Agents (Oban, medium-lived)¶

Shape: An Oban job (or chain of jobs) that orchestrates a multi-step business process. Checkpoints state to Postgres between steps. Survives node restarts. Distributes across cluster.

Durability: Hours to weeks. Pay run orchestration might live a few hours; onboarding pipelines run for days to weeks.

Cost profile: Higher per-invocation than Tier 1 (multi-step reasoning), but far lower per-business-outcome because each step is bounded.

Types:

Agent	Owner OTP app	Workflow
Onboarding Pipeline	`finnest_onboard`	start → DVS verification → credential check → Fair Work forms → super onboarding → induction → placement
Super Onboarding Wizard	`finnest_onboard` (crosses to `finnest_payroll` + `finnest_people`)	TFN declaration → super fund choice (USI lookup + stapled super + SMSF option) → bank details → FWIS acknowledgement → contract signing. Addresses B12 C2 gap
Scoring & Matching	`finnest_recruit`	batch scoring (deterministic) → AI reasoning only for edge cases (AMBER band) → PROPOSE to human
Pay Run Processing	`finnest_payroll`	collect timecards → apply awards via `AwardInterpreter` → compliance check → PROPOSE pay run → human approve → submit STP → generate invoices
Incident Response	`finnest_safety`	report → classify severity → notify → investigate → corrective action → close

Checkpoint & resume:

defmodule Finnest.Onboard.Workers.OnboardingPipeline do
  use Oban.Worker, queue: :onboard_queue, max_attempts: 5, unique: [fields: [:args], period: 60]

  def perform(%Oban.Job{args: %{"pipeline_id" => id, "step" => step}}) do
    pipeline = Onboard.PipelineOrchestrator.load(id)

    case step do
      "dvs_verification" ->
        with {:ok, result} <- DocumentVerification.verify(pipeline.documents),
             :ok <- Onboard.PipelineOrchestrator.record_result(pipeline, :dvs, result) do
          enqueue_next(pipeline, "credential_check")
        end

      "credential_check" -> ...
      "super_onboarding" -> ...
      # ... etc
    end
  end
end

Each step persists its output to onboard.pipeline_steps before enqueueing the next. A node crash mid-step means the job retries (QJ-01 idempotency — same input, same output).

Tier 3 — Autonomous Agents (Oban cron, long-lived)¶

Shape: Always-on cron-triggered jobs. Scan state, detect patterns, PROPOSE actions. Never execute destructive operations autonomously.

Constraint: PROPOSE-only (AI-06). Allowed operations: READ, NOTIFY, REPORT. Forbidden: WRITE, UPDATE, DELETE.

Types:

Agent	Owner	Schedule	Output
Compliance Monitor	`finnest_pulse` (operates on `finnest_compliance` data)	Nightly 02:00 AEST	Notifications for credential expiries within 30/14/7 days; regulatory change alerts
Anomaly Detector	`finnest_pulse`	Daily + on pay run finalise	Flags timesheet fraud signals, payroll discrepancies, roster conflicts for human review
Roster Optimiser	`finnest_roster`	Overnight per org (staggered)	Proposes optimised shift assignments considering availability, skills, compliance, fatigue, cost. Human approves before changes apply
Data Quality Agent	`finnest_pulse`	Weekly	Incomplete records, stale credentials, data inconsistencies. Creates tasks for humans

Why PROPOSE-only: An autonomous agent that can mutate state is an autonomous agent that can silently corrupt state. The value is in pattern detection + surfacing; execution stays human.

Infrastructure¶

`finnest_agents` supervision tree¶

FinnestAgents.Supervisor  (one_for_one)
├── FinnestAgents.Orchestrator      (singleton GenServer — intent routing, session creation)
├── FinnestAgents.ToolRegistry      (GenServer — discovers MCP servers from every domain at boot)
├── FinnestAgents.AgentSupervisor   (DynamicSupervisor, restart: :temporary — Tier 1 agents per session)
├── FinnestAgents.ClaudeClient      (GenServer — Finch HTTP pool + cost tracking)
├── FinnestAgents.BudgetGuard       (GenServer — per-org spend circuit breaker)
├── FinnestAgents.MemoryCoordinator (GenServer — L1/L2/L3 memory read/write routing)
└── FinnestAgents.PromptCache       (GenServer — tracks Anthropic prompt-cache metrics)

The Orchestrator — two-tier intent routing¶

User query arrives at Orchestrator
  │
  ├── TIER A: Pattern match (cost $0, latency <5ms)
  │   Patterns compiled at boot from finnest_agents/patterns/*.ex
  │   Examples:
  │     "show (me)? roster" → roster_mcp.list_shifts
  │     "leave balance" → people_mcp.get_leave_balance
  │     "[first_name] [last_name]" → people_mcp.get_employee
  │     "clock (me )?in" → timekeep flow (mobile)
  │   Covers ~70% of production queries (B03 cost model)
  │
  ├── TIER B: Deterministic computation (cost $0, latency <50ms)
  │   Intent is clear but needs cross-domain composition:
  │     "Is John eligible for mining site?" → compliance_mcp.check_credentials
  │     "Score candidates for this job order" → recruit_mcp.score_candidates
  │   Covers ~15% of production queries
  │
  └── TIER C: Claude (cost $0.01–0.05, latency 500–2500ms)
      Genuine reasoning or generation needed:
        Ambiguous intent classification
        Natural language generation
        Edge-case composition
      Covers ~15% of production queries

Routing decision flow:

defmodule FinnestAgents.Orchestrator do
  def route(intent_text, %{org_id: org_id, session_id: session_id} = ctx) do
    # Tier A
    with {:no_match, _} <- FinnestAgents.PatternMatcher.match(intent_text, ctx),
         # Tier B
         {:no_match, _} <- FinnestAgents.DeterministicResolver.resolve(intent_text, ctx),
         # Tier C
         :ok <- FinnestAgents.BudgetGuard.check(org_id),
         {:ok, classified} <- FinnestAgents.ClaudeClient.classify(intent_text, ctx) do
      dispatch(classified, ctx)
    else
      {:match, tool_call} -> execute_mcp(tool_call, ctx)
      {:budget_exceeded, _} -> fallback_local_only(intent_text, ctx)
    end
  end
end

Tool Registry¶

Discovers and catalogues every MCP server at boot:

defmodule FinnestAgents.ToolRegistry do
  def start_link(_) do
    GenServer.start_link(__MODULE__, :ok, name: __MODULE__)
  end

  def init(:ok) do
    # Discover every domain with an MCP server
    tools =
      Application.loaded_applications()
      |> Enum.filter(fn {app, _, _} -> app_exposes_mcp?(app) end)
      |> Enum.flat_map(&discover_tools/1)

    {:ok, %{tools: tools, by_category: index_by_category(tools)}}
  end
end

Tool metadata indexed:

Name (e.g. roster_list_shifts)
Domain (:roster)
Category (:read | :propose | :execute | :restricted)
Input schema (typed fields, required/optional)
Output schema
Permission matrix — which roles can invoke; IRAP restrictions
MCP server pid (if dynamic discovery) or module (if static)

ClaudeClient (hexagonal `AiProvider` port)¶

defmodule FinnestAgents.AiProvider do
  @callback classify(intent :: String.t(), context :: map()) ::
              {:ok, classified_intent} | {:error, reason}
  @callback generate(messages :: [map()], tools :: [map()], opts :: keyword()) ::
              {:ok, response} | {:error, reason}
  @callback stream(messages :: [map()], tools :: [map()], opts :: keyword()) ::
              Enumerable.t() | {:error, reason}
end

# Adapters (5 total — main doc):
#   FinnestAgents.AiProvider.AnthropicDirect     (commercial primary)
#   FinnestAgents.AiProvider.BedrockSydney        (IRAP primary)
#   FinnestAgents.AiProvider.VertexAU             (Verify fallback only)
#   FinnestAgents.AiProvider.MockProvider         (tests)
#   FinnestAgents.AiProvider.LocalLLMProvider     (Phase 3+ — vLLM container)

Cross-cutting concerns in all adapters:

Finch HTTP pool (connection reuse — saves TLS handshake)
Per-org rate limit (via FinnestAgents.BudgetGuard)
Cost tracking per request → agents.tool_audit
Failover chain per AI-07: primary → fallback → manual review
Response cache per intent signature (ETS, 5-min TTL) for repeated queries
Structured logging with correlation ID (AW-14)

Prompt Caching (AI-09 guardrail, Part 3 decision)¶

Anthropic prompt cache delivers 90% discount on cached tokens. Finnest's system prompts + MCP tool schemas are large (~10K tokens) and stable — ideal cache candidates.

Required prompt structure:

[CACHE_BREAKPOINT: PERMANENT]
System prompt (role, principles, formatting rules)       ← cached across all sessions
MCP tool schemas (full typed definitions)                 ← cached across all sessions

[CACHE_BREAKPOINT: PER_ORG]
Org context (industry profiles, terminology, flags)       ← cached per-org

[CACHE_BREAKPOINT: NONE]
Session history (last N messages)                          ← not cached (changes every turn)
User query                                                 ← not cached

Claude client enforces this ordering:

defmodule FinnestAgents.AiProvider.AnthropicDirect do
  @behaviour FinnestAgents.AiProvider

  def build_request(session, user_message, tools) do
    %{
      model: model_for(session),
      system: [
        %{type: "text", text: Prompts.base_system(), cache_control: %{type: "ephemeral"}},
        %{type: "text", text: Prompts.tool_schemas(tools), cache_control: %{type: "ephemeral"}},
        %{type: "text", text: Prompts.org_context(session.org_id), cache_control: %{type: "ephemeral"}}
      ],
      messages: session.history ++ [%{role: "user", content: user_message}]
    }
  end
end

Observability:

FinnestAgents.PromptCache GenServer aggregates cache_creation_input_tokens, cache_read_input_tokens, input_tokens, output_tokens per request
Target: ≥70% cache hit rate measured as cache_read / (cache_read + cache_creation + input) on cacheable content
Dashboard panel; alert if hit rate drops below 50% for 1 hour (likely means someone broke the prompt structure)

Memory System¶

Three levels (B03 infrastructure section), each solving a different problem.

Level	Storage	Scope	Lifetime	Purpose
L1: Session	GenServer state (+ `agents.messages` async)	Single conversation	Process lifetime (hydrated on reconnect)	Working context — what the user and agent are discussing right now
L2: Tenant	`agents.memories` (Postgres)	Per-org	Permanent	Org-specific patterns and preferences — "this client always wants weekend-only workers"; "this org prefers SMS over email for shift confirmations"
L3: Domain	`agents.memories` + event store aggregations	Platform-wide (anonymised)	Permanent	Industry intelligence — "across all construction orgs, forklift certifications expire X days before detection"; competitive moat data

L2 memory writes:

Explicit: agent proposes "remember this preference" → human confirms → agents.memories row
Implicit: pattern detected across N repeated similar interactions → Tier-3 agent proposes addition

L3 memory writes:

Aggregations run on events.domain_events partitions nightly
PII stripped; only org_id-free patterns retained
Used for Tier-½ response generation to benefit from cross-org learning

Memory retrieval during Tier C:

User query → ClaudeClient.generate(...)
  └─ system prompt includes:
      - base system (cached)
      - tool schemas (cached)
      - org_context (cached per-org)
      - relevant L2 memories (max 5, retrieved by embedding similarity — future phase)
      - relevant L3 patterns (max 3)
      - session history (uncached)

Governance¶

Tenant Isolation (AI-03, AI-04)¶

See main architecture.md Part 8 and data.md. Key points for agents:

org_id is injected by MCP framework from session.metadata — agents cannot set it
Every agent GenServer is spawned with {session_id, org_id, user_id} in initial state
FinnestAgents.AgentSupervisor uses restart: :temporary — a terminated agent doesn't auto-restart with stale tenant context
Session end = process termination — no reuse across tenants (AI-04)

Action Classification (MCP tool categories)¶

Category	Effect	Human required	Agent autonomy	Example
READ	Query / list / get	No	Full	`roster_list_shifts`, `compliance_check_worker`
PROPOSE	Agent suggests; human confirms before execute	Yes (before)	Proposal only	`roster_propose_assignment`, `recruit_propose_candidates`
EXECUTE	Agent acts; human notified after	No (notified)	Full with notification	`reach_send_message`, `timekeep_record_clock_event`
RESTRICTED	Human must initiate	Yes (must initiate)	None (agent cannot trigger)	`payroll_finalise_run`, `people_terminate_employee`

Per-org defaults: EXECUTE becomes PROPOSE in conservative orgs via feature flag agent_action_mode=strict.

Confidence Framework (AW-12)¶

Every agent response carries a confidence band:

Band	Threshold (general)	Threshold (compliance)	Action
GREEN	>90%	>95%	Act autonomously (within category allowance); notify after
AMBER	70–90%	80–95%	Propose action; await human approval
RED	<70%	<80%	Flag for human handling; log to review queue

Compliance-affecting actions have stricter thresholds because error cost is legal.

Budget Management (AI-08)¶

FinnestAgents.BudgetGuard tracks per-org spend in real time:

defmodule FinnestAgents.BudgetGuard do
  use GenServer

  # State: %{org_id => %{daily: Decimal, weekly: Decimal, monthly: Decimal,
  #                       limits: %{daily: Decimal, weekly: Decimal, monthly: Decimal}}}

  def check(org_id) do
    case GenServer.call(__MODULE__, {:check, org_id}) do
      :ok -> :ok
      {:warning, pct} -> {:ok, {:warning, pct}}     # 80% threshold
      :circuit_breaker_open -> {:error, :budget_exceeded}
    end
  end

  def record_spend(org_id, amount_aud, category) do
    GenServer.cast(__MODULE__, {:record, org_id, amount_aud, category})
  end
end

Behaviour:

Accumulated via cast (non-blocking hot path)
80% threshold → warning logged; admin notified; UI banner shown to org users
100% threshold → circuit breaker opens; subsequent requests fall back to Tier A/B only; admin alerted
Resets per period (daily/weekly/monthly)
State persisted to agents.budget_limits every 5 min + on clean shutdown

Audit Trail (AW-14)¶

Every agent action produces:

Event	Destination	Retention
User message	`agents.messages` (full text)	90 days commercial / 7 years IRAP
Agent response	`agents.messages` (full text)	Same
Tool invocation	`agents.tool_audit` (tool name, input, output, ms, cost, correlation_id)	90 days commercial / 7 years IRAP
LLM API call	`agents.tool_audit` (prompt_hash, response_hash, model, tokens, cost, cache_stats, correlation_id) — hash, not plaintext, to avoid PII leak	Same
Business-event emission	`events.domain_events` (as always)	7 years both

Hash, not plaintext, for LLM logs (Commandment #24): prompt content may contain PII. We log a BLAKE2b hash so we can prove the call happened and deduplicate cache hits, without creating a PII exposure surface.

Correlation IDs link: - User message → orchestrator route → MCP tool calls → LLM API call → business events → reactions → further events (AI-05 bounds this at 10)

Cross-Domain Agent Coordination¶

Example: "Onboard John Smith for Woolworths Minchinbury, mining project"

1. Orchestrator classifies: multi-domain onboarding → Tier 2
2. Creates OnboardingPipelineAgent (Oban workflow)

3. Step 1: recruit_mcp.get_candidate("John Smith") [READ]
   → returns candidate data, correlation_id bound

4. Step 2: compliance_mcp.check_credentials(candidate, industry: "mining") [READ]
   → white_card ✓, first_aid ✓, confined_space ✗ (expired)
   → AMBER confidence on "expired" → pipeline pauses for decision

5. Step 3: safety_mcp.check_site_inductions(candidate, "Minchinbury") [READ]
   → site induction not completed

6. Step 4: fatigue_mcp.check_fitness(candidate) [READ]
   → fit for duty ✓

7. Step 5: roster_mcp.check_availability(candidate, date) [READ]
   → available ✓

8. [HUMAN GATE] Present findings via LiveView or mobile notification:
   "John Smith available & fit. Needs: confined space recert + site induction.
    Approve onboarding with these conditions?"

9. [On approval] Step 6: onboard_mcp.start_onboarding(candidate, conditions) [EXECUTE]
10. Step 7: reach_mcp.send_confirmation(candidate, details) [EXECUTE]

Each step → event → correlation_id links entire chain.
Max 10 events per chain (AI-05) — this chain uses 7, well within budget.

Failure Prevention¶

Failure mode	Prevention mechanism
Every query hits Claude	Two-tier routing (Tier A pattern + Tier B deterministic) catches ~85% before Tier C
Agent hallucinates pay rate / award / credential	`AwardInterpreter` + `Compliance.check/2` are deterministic rules engines; agents query and present, never reason over the values (AI-02, B03 Insight 4)
Cross-domain infinite loop	Correlation + causation IDs on every event; max 10 events per chain (AI-05); Orchestrator refuses to fire event that would exceed
Tenant data leakage	`org_id` injected at MCP framework layer; per-tenant agent processes; tests verify isolation (AI-03, AI-04)
Autonomous destructive action	Tier 3 agents PROPOSE-only; MCP category `RESTRICTED` requires human initiation (AI-06)
Undebuggable behaviour	Every action logged with correlation_id to `agents.tool_audit`; session messages in `agents.messages`; replay tool reconstructs decision chains
Cost explosion	`BudgetGuard` per-org circuit breaker with 80% warning and 100% hard stop (AI-08)
Prompt injection via user input	System prompts frame user content as untrusted; tool inputs validated at MCP layer via typed schemas (not just free text)
Stale cached responses	TTL 5 min; invalidated on relevant domain event (e.g. roster response cache invalidated on `shift_updated`)
Provider outage	Failover chain: primary → fallback → manual review (AI-07). Cost tracked per-provider for visibility

MCP Tool Definition (gold standard)¶

Every tool follows this shape:

defmodule Finnest.Roster.MCP.Tools.ListShifts do
  use FinnestAgents.MCP.Tool,
    name: "roster_list_shifts",
    domain: :roster,
    category: :read,
    description: "List shifts for an org within a date range, optionally filtered by site."

  input :date_from, :date, required: true, description: "Start of range, inclusive."
  input :date_to,   :date, required: true, description: "End of range, inclusive."
  input :site_id,   :uuid, required: false, description: "Optional site filter."
  input :status,    {:enum, [:scheduled, :in_progress, :completed]}, required: false

  output_schema %{
    shifts: [%{
      id: :uuid,
      start_at: :datetime,
      end_at: :datetime,
      site_id: :uuid,
      site_name: :string,
      worker_id: {:optional, :uuid},
      worker_name: {:optional, :string},
      status: :string
    }]
  }

  # org_id injected by MCP framework from session.metadata — NEVER from agent
  def call(%{date_from: from, date_to: to} = params, %{org_id: org_id} = _ctx) do
    shifts = Finnest.Roster.Queries.list_shifts(org_id,
      from: from,
      to: to,
      site_id: params[:site_id],
      status: params[:status]
    )

    {:ok, %{shifts: Enum.map(shifts, &format_shift/1)}}
  end
end

Gold-standard invariants:

Name is <domain>_<verb>_<noun> (e.g. roster_list_shifts)
Category is explicit (:read | :propose | :execute | :restricted)
Input fields are typed with required/optional markers
Output schema is declared (agent can introspect before calling)
org_id extracted from context, not params — agents can't override
Body delegates to domain Queries/Commands module — no business logic in the MCP tool
PROPOSE tools return a proposal struct, not the side-effect
EXECUTE tools call the gated context function (which triggers Compliance.check/2 if applicable)

Observability (D19)¶

Per-session telemetry:

Time to first token (TTFT)
Total response time
Tokens (prompt, completion, cached)
Cost AUD
MCP tool calls count
Confidence distribution

Per-org dashboard (Grafana):

AI spend current day / week / month (budget traffic lights)
Prompt cache hit rate
Top intents routed (A/B/C distribution)
Agent error rate
Active sessions over time

Alert thresholds:

Budget 80% of monthly cap → warning
Budget 100% of monthly cap → critical (circuit breaker armed)
Cache hit rate <50% for 1 hour → warning (likely prompt restructuring broke caching)
Tier C rate >25% of total → warning (pattern coverage is slipping)
TTFT p95 >2s for 5 min → warning (provider issue)

Future Considerations¶

LocalLLMProvider (planned adapter #5, Phase 3+):

Trigger conditions (main arch doc OI-11):

IRAP Phase 3 deployment wants stronger sovereignty posture (local LLM handles intent classification + PII scrubbing within IRAP VPC)
Scale exceeds 25K employees or 3 paying clients (GPU amortises)

Provisional plan:

8B or 14B model (Llama 3.3 8B / Phi-4 14B / Qwen 2.5 7B candidates)
vLLM in separate container
Role: Tier-1.5 — catches ambiguous intents that Tier A pattern-match misses, before falling through to Tier C Claude
Adapter slots into existing AiProvider port — no agent code changes required

Federated MCP:

When external agents (customer-built agents, partner agents) need access, the MCP servers become externally callable over JSON-RPC. Current in-process behaviour stays; JSON-RPC becomes a second transport on the same tool definitions.

On-device inference (mobile edge AI):

For privacy-sensitive flows (document classification on the device before upload), explore Core ML / TFLite models shipped with the Flutter app. Deferred — not scoped in 44-week roadmap.

Finnest Agents Architecture¶

Purpose¶

The Three Tiers¶

Tier 1 — Conversational Agents (GenServer, short-lived)¶

Tier 2 — Workflow Agents (Oban, medium-lived)¶

Tier 3 — Autonomous Agents (Oban cron, long-lived)¶

Infrastructure¶

finnest_agents supervision tree¶

The Orchestrator — two-tier intent routing¶

Tool Registry¶

ClaudeClient (hexagonal AiProvider port)¶

Prompt Caching (AI-09 guardrail, Part 3 decision)¶

Memory System¶

Governance¶

Tenant Isolation (AI-03, AI-04)¶

Action Classification (MCP tool categories)¶

Confidence Framework (AW-12)¶

Budget Management (AI-08)¶

Audit Trail (AW-14)¶

Cross-Domain Agent Coordination¶

Failure Prevention¶

MCP Tool Definition (gold standard)¶

Observability (D19)¶

Future Considerations¶

`finnest_agents` supervision tree¶

ClaudeClient (hexagonal `AiProvider` port)¶