Skip to content

Finnest Agents Architecture

Date: 2026-04-16 Status: Draft Scope: AI agent infrastructure — how finnest_agents works, how the three tiers interact with 19 domains via MCP, how cost and governance are enforced.

Related documents: architecture.md (main), ../brainstorms/brainstorm-03-ai-agent-design.md, ../10-GUARDRAILS.md §11 (AI-01–AI-09).


Purpose

Finnest is AI-native — agents aren't a feature, they're the architecture's central nervous system. This document details the infrastructure the main architecture summarises. If the main doc describes what agents are, this doc describes how they work.


The Three Tiers

Each tier solves a different problem; each has different durability, cost, and oversight characteristics (B03 Insight 1).

Tier 1 — Conversational Agents (GenServer, short-lived)

Shape: One GenServer process per user session. Holds conversation state in process memory. Dies when session ends.

Durability: Process lifetime only. If the user disconnects for >10 min, session memory is flushed to agents.sessions / agents.messages in Postgres and the GenServer terminates. On reconnect, a fresh GenServer rehydrates context from L2 memory.

Cost profile: Variable — pattern-match tier is \(0; LLM-classified intent ~\)0.01; rich generation ~$0.02–0.05.

Types:

Agent Role
User Chat Agent General-purpose; routes to domain specialists via MCP. The user-facing entrypoint that responds to Cmd+K and the mobile chat.
Reach Agent Outbound + inbound SMS / voice / chat / email orchestration for workers, candidates, and clients. Decides templates, extracts intent from replies, hands off to humans when confidence drops.
Admin Assistant Natural-language queries over dashboards ("show me this week's unfilled shifts"). Answers through MCP read-only tools.

Lifecycle:

User opens Cmd+K or mobile agent
  └─ finnest_agents.Orchestrator.start_session/2
      └─ AgentSupervisor.start_child({UserChatAgent, %{session_id, org_id, user_id}})
          └─ GenServer started, restart: :temporary
              └─ session state kept in process dictionary
              └─ messages persisted to agents.messages async (for audit)

User sends message
  └─ send(session_pid, {:user_message, text, correlation_id})
      └─ handle_cast: two-tier classify → route to MCP tool(s)
      └─ stream response back via Phoenix Channel (agent:<session_id> topic)

User idle 10+ min OR disconnect
  └─ :timeout triggers graceful shutdown
      └─ persist remaining state to agents.sessions
      └─ GenServer exits

User returns
  └─ Orchestrator rehydrates from agents.sessions → new GenServer

Tier 2 — Workflow Agents (Oban, medium-lived)

Shape: An Oban job (or chain of jobs) that orchestrates a multi-step business process. Checkpoints state to Postgres between steps. Survives node restarts. Distributes across cluster.

Durability: Hours to weeks. Pay run orchestration might live a few hours; onboarding pipelines run for days to weeks.

Cost profile: Higher per-invocation than Tier 1 (multi-step reasoning), but far lower per-business-outcome because each step is bounded.

Types:

Agent Owner OTP app Workflow
Onboarding Pipeline finnest_onboard start → DVS verification → credential check → Fair Work forms → super onboarding → induction → placement
Super Onboarding Wizard finnest_onboard (crosses to finnest_payroll + finnest_people) TFN declaration → super fund choice (USI lookup + stapled super + SMSF option) → bank details → FWIS acknowledgement → contract signing. Addresses B12 C2 gap
Scoring & Matching finnest_recruit batch scoring (deterministic) → AI reasoning only for edge cases (AMBER band) → PROPOSE to human
Pay Run Processing finnest_payroll collect timecards → apply awards via AwardInterpreter → compliance check → PROPOSE pay run → human approve → submit STP → generate invoices
Incident Response finnest_safety report → classify severity → notify → investigate → corrective action → close

Checkpoint & resume:

defmodule Finnest.Onboard.Workers.OnboardingPipeline do
  use Oban.Worker, queue: :onboard_queue, max_attempts: 5, unique: [fields: [:args], period: 60]

  def perform(%Oban.Job{args: %{"pipeline_id" => id, "step" => step}}) do
    pipeline = Onboard.PipelineOrchestrator.load(id)

    case step do
      "dvs_verification" ->
        with {:ok, result} <- DocumentVerification.verify(pipeline.documents),
             :ok <- Onboard.PipelineOrchestrator.record_result(pipeline, :dvs, result) do
          enqueue_next(pipeline, "credential_check")
        end

      "credential_check" -> ...
      "super_onboarding" -> ...
      # ... etc
    end
  end
end

Each step persists its output to onboard.pipeline_steps before enqueueing the next. A node crash mid-step means the job retries (QJ-01 idempotency — same input, same output).

Tier 3 — Autonomous Agents (Oban cron, long-lived)

Shape: Always-on cron-triggered jobs. Scan state, detect patterns, PROPOSE actions. Never execute destructive operations autonomously.

Constraint: PROPOSE-only (AI-06). Allowed operations: READ, NOTIFY, REPORT. Forbidden: WRITE, UPDATE, DELETE.

Types:

Agent Owner Schedule Output
Compliance Monitor finnest_pulse (operates on finnest_compliance data) Nightly 02:00 AEST Notifications for credential expiries within 30/14/7 days; regulatory change alerts
Anomaly Detector finnest_pulse Daily + on pay run finalise Flags timesheet fraud signals, payroll discrepancies, roster conflicts for human review
Roster Optimiser finnest_roster Overnight per org (staggered) Proposes optimised shift assignments considering availability, skills, compliance, fatigue, cost. Human approves before changes apply
Data Quality Agent finnest_pulse Weekly Incomplete records, stale credentials, data inconsistencies. Creates tasks for humans

Why PROPOSE-only: An autonomous agent that can mutate state is an autonomous agent that can silently corrupt state. The value is in pattern detection + surfacing; execution stays human.


Infrastructure

finnest_agents supervision tree

FinnestAgents.Supervisor  (one_for_one)
├── FinnestAgents.Orchestrator      (singleton GenServer — intent routing, session creation)
├── FinnestAgents.ToolRegistry      (GenServer — discovers MCP servers from every domain at boot)
├── FinnestAgents.AgentSupervisor   (DynamicSupervisor, restart: :temporary — Tier 1 agents per session)
├── FinnestAgents.ClaudeClient      (GenServer — Finch HTTP pool + cost tracking)
├── FinnestAgents.BudgetGuard       (GenServer — per-org spend circuit breaker)
├── FinnestAgents.MemoryCoordinator (GenServer — L1/L2/L3 memory read/write routing)
└── FinnestAgents.PromptCache       (GenServer — tracks Anthropic prompt-cache metrics)

The Orchestrator — two-tier intent routing

User query arrives at Orchestrator
  ├── TIER A: Pattern match (cost $0, latency <5ms)
  │   Patterns compiled at boot from finnest_agents/patterns/*.ex
  │   Examples:
  │     "show (me)? roster" → roster_mcp.list_shifts
  │     "leave balance" → people_mcp.get_leave_balance
  │     "[first_name] [last_name]" → people_mcp.get_employee
  │     "clock (me )?in" → timekeep flow (mobile)
  │   Covers ~70% of production queries (B03 cost model)
  ├── TIER B: Deterministic computation (cost $0, latency <50ms)
  │   Intent is clear but needs cross-domain composition:
  │     "Is John eligible for mining site?" → compliance_mcp.check_credentials
  │     "Score candidates for this job order" → recruit_mcp.score_candidates
  │   Covers ~15% of production queries
  └── TIER C: Claude (cost $0.01–0.05, latency 500–2500ms)
      Genuine reasoning or generation needed:
        Ambiguous intent classification
        Natural language generation
        Edge-case composition
      Covers ~15% of production queries

Routing decision flow:

defmodule FinnestAgents.Orchestrator do
  def route(intent_text, %{org_id: org_id, session_id: session_id} = ctx) do
    # Tier A
    with {:no_match, _} <- FinnestAgents.PatternMatcher.match(intent_text, ctx),
         # Tier B
         {:no_match, _} <- FinnestAgents.DeterministicResolver.resolve(intent_text, ctx),
         # Tier C
         :ok <- FinnestAgents.BudgetGuard.check(org_id),
         {:ok, classified} <- FinnestAgents.ClaudeClient.classify(intent_text, ctx) do
      dispatch(classified, ctx)
    else
      {:match, tool_call} -> execute_mcp(tool_call, ctx)
      {:budget_exceeded, _} -> fallback_local_only(intent_text, ctx)
    end
  end
end

Tool Registry

Discovers and catalogues every MCP server at boot:

defmodule FinnestAgents.ToolRegistry do
  def start_link(_) do
    GenServer.start_link(__MODULE__, :ok, name: __MODULE__)
  end

  def init(:ok) do
    # Discover every domain with an MCP server
    tools =
      Application.loaded_applications()
      |> Enum.filter(fn {app, _, _} -> app_exposes_mcp?(app) end)
      |> Enum.flat_map(&discover_tools/1)

    {:ok, %{tools: tools, by_category: index_by_category(tools)}}
  end
end

Tool metadata indexed:

  • Name (e.g. roster_list_shifts)
  • Domain (:roster)
  • Category (:read | :propose | :execute | :restricted)
  • Input schema (typed fields, required/optional)
  • Output schema
  • Permission matrix — which roles can invoke; IRAP restrictions
  • MCP server pid (if dynamic discovery) or module (if static)

ClaudeClient (hexagonal AiProvider port)

defmodule FinnestAgents.AiProvider do
  @callback classify(intent :: String.t(), context :: map()) ::
              {:ok, classified_intent} | {:error, reason}
  @callback generate(messages :: [map()], tools :: [map()], opts :: keyword()) ::
              {:ok, response} | {:error, reason}
  @callback stream(messages :: [map()], tools :: [map()], opts :: keyword()) ::
              Enumerable.t() | {:error, reason}
end

# Adapters (5 total — main doc):
#   FinnestAgents.AiProvider.AnthropicDirect     (commercial primary)
#   FinnestAgents.AiProvider.BedrockSydney        (IRAP primary)
#   FinnestAgents.AiProvider.VertexAU             (Verify fallback only)
#   FinnestAgents.AiProvider.MockProvider         (tests)
#   FinnestAgents.AiProvider.LocalLLMProvider     (Phase 3+ — vLLM container)

Cross-cutting concerns in all adapters:

  • Finch HTTP pool (connection reuse — saves TLS handshake)
  • Per-org rate limit (via FinnestAgents.BudgetGuard)
  • Cost tracking per request → agents.tool_audit
  • Failover chain per AI-07: primary → fallback → manual review
  • Response cache per intent signature (ETS, 5-min TTL) for repeated queries
  • Structured logging with correlation ID (AW-14)

Prompt Caching (AI-09 guardrail, Part 3 decision)

Anthropic prompt cache delivers 90% discount on cached tokens. Finnest's system prompts + MCP tool schemas are large (~10K tokens) and stable — ideal cache candidates.

Required prompt structure:

[CACHE_BREAKPOINT: PERMANENT]
System prompt (role, principles, formatting rules)       ← cached across all sessions
MCP tool schemas (full typed definitions)                 ← cached across all sessions

[CACHE_BREAKPOINT: PER_ORG]
Org context (industry profiles, terminology, flags)       ← cached per-org

[CACHE_BREAKPOINT: NONE]
Session history (last N messages)                          ← not cached (changes every turn)
User query                                                 ← not cached

Claude client enforces this ordering:

defmodule FinnestAgents.AiProvider.AnthropicDirect do
  @behaviour FinnestAgents.AiProvider

  def build_request(session, user_message, tools) do
    %{
      model: model_for(session),
      system: [
        %{type: "text", text: Prompts.base_system(), cache_control: %{type: "ephemeral"}},
        %{type: "text", text: Prompts.tool_schemas(tools), cache_control: %{type: "ephemeral"}},
        %{type: "text", text: Prompts.org_context(session.org_id), cache_control: %{type: "ephemeral"}}
      ],
      messages: session.history ++ [%{role: "user", content: user_message}]
    }
  end
end

Observability:

  • FinnestAgents.PromptCache GenServer aggregates cache_creation_input_tokens, cache_read_input_tokens, input_tokens, output_tokens per request
  • Target: ≥70% cache hit rate measured as cache_read / (cache_read + cache_creation + input) on cacheable content
  • Dashboard panel; alert if hit rate drops below 50% for 1 hour (likely means someone broke the prompt structure)

Memory System

Three levels (B03 infrastructure section), each solving a different problem.

Level Storage Scope Lifetime Purpose
L1: Session GenServer state (+ agents.messages async) Single conversation Process lifetime (hydrated on reconnect) Working context — what the user and agent are discussing right now
L2: Tenant agents.memories (Postgres) Per-org Permanent Org-specific patterns and preferences — "this client always wants weekend-only workers"; "this org prefers SMS over email for shift confirmations"
L3: Domain agents.memories + event store aggregations Platform-wide (anonymised) Permanent Industry intelligence — "across all construction orgs, forklift certifications expire X days before detection"; competitive moat data

L2 memory writes:

  • Explicit: agent proposes "remember this preference" → human confirms → agents.memories row
  • Implicit: pattern detected across N repeated similar interactions → Tier-3 agent proposes addition

L3 memory writes:

  • Aggregations run on events.domain_events partitions nightly
  • PII stripped; only org_id-free patterns retained
  • Used for Tier-½ response generation to benefit from cross-org learning

Memory retrieval during Tier C:

User query → ClaudeClient.generate(...)
  └─ system prompt includes:
      - base system (cached)
      - tool schemas (cached)
      - org_context (cached per-org)
      - relevant L2 memories (max 5, retrieved by embedding similarity — future phase)
      - relevant L3 patterns (max 3)
      - session history (uncached)

Governance

Tenant Isolation (AI-03, AI-04)

See main architecture.md Part 8 and data.md. Key points for agents:

  • org_id is injected by MCP framework from session.metadata — agents cannot set it
  • Every agent GenServer is spawned with {session_id, org_id, user_id} in initial state
  • FinnestAgents.AgentSupervisor uses restart: :temporary — a terminated agent doesn't auto-restart with stale tenant context
  • Session end = process termination — no reuse across tenants (AI-04)

Action Classification (MCP tool categories)

Category Effect Human required Agent autonomy Example
READ Query / list / get No Full roster_list_shifts, compliance_check_worker
PROPOSE Agent suggests; human confirms before execute Yes (before) Proposal only roster_propose_assignment, recruit_propose_candidates
EXECUTE Agent acts; human notified after No (notified) Full with notification reach_send_message, timekeep_record_clock_event
RESTRICTED Human must initiate Yes (must initiate) None (agent cannot trigger) payroll_finalise_run, people_terminate_employee

Per-org defaults: EXECUTE becomes PROPOSE in conservative orgs via feature flag agent_action_mode=strict.

Confidence Framework (AW-12)

Every agent response carries a confidence band:

Band Threshold (general) Threshold (compliance) Action
GREEN >90% >95% Act autonomously (within category allowance); notify after
AMBER 70–90% 80–95% Propose action; await human approval
RED <70% <80% Flag for human handling; log to review queue

Compliance-affecting actions have stricter thresholds because error cost is legal.

Budget Management (AI-08)

FinnestAgents.BudgetGuard tracks per-org spend in real time:

defmodule FinnestAgents.BudgetGuard do
  use GenServer

  # State: %{org_id => %{daily: Decimal, weekly: Decimal, monthly: Decimal,
  #                       limits: %{daily: Decimal, weekly: Decimal, monthly: Decimal}}}

  def check(org_id) do
    case GenServer.call(__MODULE__, {:check, org_id}) do
      :ok -> :ok
      {:warning, pct} -> {:ok, {:warning, pct}}     # 80% threshold
      :circuit_breaker_open -> {:error, :budget_exceeded}
    end
  end

  def record_spend(org_id, amount_aud, category) do
    GenServer.cast(__MODULE__, {:record, org_id, amount_aud, category})
  end
end

Behaviour:

  • Accumulated via cast (non-blocking hot path)
  • 80% threshold → warning logged; admin notified; UI banner shown to org users
  • 100% threshold → circuit breaker opens; subsequent requests fall back to Tier A/B only; admin alerted
  • Resets per period (daily/weekly/monthly)
  • State persisted to agents.budget_limits every 5 min + on clean shutdown

Audit Trail (AW-14)

Every agent action produces:

Event Destination Retention
User message agents.messages (full text) 90 days commercial / 7 years IRAP
Agent response agents.messages (full text) Same
Tool invocation agents.tool_audit (tool name, input, output, ms, cost, correlation_id) 90 days commercial / 7 years IRAP
LLM API call agents.tool_audit (prompt_hash, response_hash, model, tokens, cost, cache_stats, correlation_id) — hash, not plaintext, to avoid PII leak Same
Business-event emission events.domain_events (as always) 7 years both

Hash, not plaintext, for LLM logs (Commandment #24): prompt content may contain PII. We log a BLAKE2b hash so we can prove the call happened and deduplicate cache hits, without creating a PII exposure surface.

Correlation IDs link: - User message → orchestrator route → MCP tool calls → LLM API call → business events → reactions → further events (AI-05 bounds this at 10)


Cross-Domain Agent Coordination

Example: "Onboard John Smith for Woolworths Minchinbury, mining project"

1. Orchestrator classifies: multi-domain onboarding → Tier 2
2. Creates OnboardingPipelineAgent (Oban workflow)

3. Step 1: recruit_mcp.get_candidate("John Smith") [READ]
   → returns candidate data, correlation_id bound

4. Step 2: compliance_mcp.check_credentials(candidate, industry: "mining") [READ]
   → white_card ✓, first_aid ✓, confined_space ✗ (expired)
   → AMBER confidence on "expired" → pipeline pauses for decision

5. Step 3: safety_mcp.check_site_inductions(candidate, "Minchinbury") [READ]
   → site induction not completed

6. Step 4: fatigue_mcp.check_fitness(candidate) [READ]
   → fit for duty ✓

7. Step 5: roster_mcp.check_availability(candidate, date) [READ]
   → available ✓

8. [HUMAN GATE] Present findings via LiveView or mobile notification:
   "John Smith available & fit. Needs: confined space recert + site induction.
    Approve onboarding with these conditions?"

9. [On approval] Step 6: onboard_mcp.start_onboarding(candidate, conditions) [EXECUTE]
10. Step 7: reach_mcp.send_confirmation(candidate, details) [EXECUTE]

Each step → event → correlation_id links entire chain.
Max 10 events per chain (AI-05) — this chain uses 7, well within budget.

Failure Prevention

Failure mode Prevention mechanism
Every query hits Claude Two-tier routing (Tier A pattern + Tier B deterministic) catches ~85% before Tier C
Agent hallucinates pay rate / award / credential AwardInterpreter + Compliance.check/2 are deterministic rules engines; agents query and present, never reason over the values (AI-02, B03 Insight 4)
Cross-domain infinite loop Correlation + causation IDs on every event; max 10 events per chain (AI-05); Orchestrator refuses to fire event that would exceed
Tenant data leakage org_id injected at MCP framework layer; per-tenant agent processes; tests verify isolation (AI-03, AI-04)
Autonomous destructive action Tier 3 agents PROPOSE-only; MCP category RESTRICTED requires human initiation (AI-06)
Undebuggable behaviour Every action logged with correlation_id to agents.tool_audit; session messages in agents.messages; replay tool reconstructs decision chains
Cost explosion BudgetGuard per-org circuit breaker with 80% warning and 100% hard stop (AI-08)
Prompt injection via user input System prompts frame user content as untrusted; tool inputs validated at MCP layer via typed schemas (not just free text)
Stale cached responses TTL 5 min; invalidated on relevant domain event (e.g. roster response cache invalidated on shift_updated)
Provider outage Failover chain: primary → fallback → manual review (AI-07). Cost tracked per-provider for visibility

MCP Tool Definition (gold standard)

Every tool follows this shape:

defmodule Finnest.Roster.MCP.Tools.ListShifts do
  use FinnestAgents.MCP.Tool,
    name: "roster_list_shifts",
    domain: :roster,
    category: :read,
    description: "List shifts for an org within a date range, optionally filtered by site."

  input :date_from, :date, required: true, description: "Start of range, inclusive."
  input :date_to,   :date, required: true, description: "End of range, inclusive."
  input :site_id,   :uuid, required: false, description: "Optional site filter."
  input :status,    {:enum, [:scheduled, :in_progress, :completed]}, required: false

  output_schema %{
    shifts: [%{
      id: :uuid,
      start_at: :datetime,
      end_at: :datetime,
      site_id: :uuid,
      site_name: :string,
      worker_id: {:optional, :uuid},
      worker_name: {:optional, :string},
      status: :string
    }]
  }

  # org_id injected by MCP framework from session.metadata — NEVER from agent
  def call(%{date_from: from, date_to: to} = params, %{org_id: org_id} = _ctx) do
    shifts = Finnest.Roster.Queries.list_shifts(org_id,
      from: from,
      to: to,
      site_id: params[:site_id],
      status: params[:status]
    )

    {:ok, %{shifts: Enum.map(shifts, &format_shift/1)}}
  end
end

Gold-standard invariants:

  1. Name is <domain>_<verb>_<noun> (e.g. roster_list_shifts)
  2. Category is explicit (:read | :propose | :execute | :restricted)
  3. Input fields are typed with required/optional markers
  4. Output schema is declared (agent can introspect before calling)
  5. org_id extracted from context, not params — agents can't override
  6. Body delegates to domain Queries/Commands module — no business logic in the MCP tool
  7. PROPOSE tools return a proposal struct, not the side-effect
  8. EXECUTE tools call the gated context function (which triggers Compliance.check/2 if applicable)

Observability (D19)

Per-session telemetry:

  • Time to first token (TTFT)
  • Total response time
  • Tokens (prompt, completion, cached)
  • Cost AUD
  • MCP tool calls count
  • Confidence distribution

Per-org dashboard (Grafana):

  • AI spend current day / week / month (budget traffic lights)
  • Prompt cache hit rate
  • Top intents routed (A/B/C distribution)
  • Agent error rate
  • Active sessions over time

Alert thresholds:

  • Budget 80% of monthly cap → warning
  • Budget 100% of monthly cap → critical (circuit breaker armed)
  • Cache hit rate <50% for 1 hour → warning (likely prompt restructuring broke caching)
  • Tier C rate >25% of total → warning (pattern coverage is slipping)
  • TTFT p95 >2s for 5 min → warning (provider issue)

Future Considerations

LocalLLMProvider (planned adapter #5, Phase 3+):

Trigger conditions (main arch doc OI-11):

  • IRAP Phase 3 deployment wants stronger sovereignty posture (local LLM handles intent classification + PII scrubbing within IRAP VPC)
  • Scale exceeds 25K employees or 3 paying clients (GPU amortises)

Provisional plan:

  • 8B or 14B model (Llama 3.3 8B / Phi-4 14B / Qwen 2.5 7B candidates)
  • vLLM in separate container
  • Role: Tier-1.5 — catches ambiguous intents that Tier A pattern-match misses, before falling through to Tier C Claude
  • Adapter slots into existing AiProvider port — no agent code changes required

Federated MCP:

When external agents (customer-built agents, partner agents) need access, the MCP servers become externally callable over JSON-RPC. Current in-process behaviour stays; JSON-RPC becomes a second transport on the same tool definitions.

On-device inference (mobile edge AI):

For privacy-sensitive flows (document classification on the device before upload), explore Core ML / TFLite models shipped with the Flutter app. Deferred — not scoped in 44-week roadmap.