Finnest Agents Architecture¶
Date: 2026-04-16
Status: Draft
Scope: AI agent infrastructure — how finnest_agents works, how the three tiers interact with 19 domains via MCP, how cost and governance are enforced.
Related documents: architecture.md (main), ../brainstorms/brainstorm-03-ai-agent-design.md, ../10-GUARDRAILS.md §11 (AI-01–AI-09).
Purpose¶
Finnest is AI-native — agents aren't a feature, they're the architecture's central nervous system. This document details the infrastructure the main architecture summarises. If the main doc describes what agents are, this doc describes how they work.
The Three Tiers¶
Each tier solves a different problem; each has different durability, cost, and oversight characteristics (B03 Insight 1).
Tier 1 — Conversational Agents (GenServer, short-lived)¶
Shape: One GenServer process per user session. Holds conversation state in process memory. Dies when session ends.
Durability: Process lifetime only. If the user disconnects for >10 min, session memory is flushed to agents.sessions / agents.messages in Postgres and the GenServer terminates. On reconnect, a fresh GenServer rehydrates context from L2 memory.
Cost profile: Variable — pattern-match tier is \(0; LLM-classified intent ~\)0.01; rich generation ~$0.02–0.05.
Types:
| Agent | Role |
|---|---|
| User Chat Agent | General-purpose; routes to domain specialists via MCP. The user-facing entrypoint that responds to Cmd+K and the mobile chat. |
| Reach Agent | Outbound + inbound SMS / voice / chat / email orchestration for workers, candidates, and clients. Decides templates, extracts intent from replies, hands off to humans when confidence drops. |
| Admin Assistant | Natural-language queries over dashboards ("show me this week's unfilled shifts"). Answers through MCP read-only tools. |
Lifecycle:
User opens Cmd+K or mobile agent
└─ finnest_agents.Orchestrator.start_session/2
└─ AgentSupervisor.start_child({UserChatAgent, %{session_id, org_id, user_id}})
└─ GenServer started, restart: :temporary
└─ session state kept in process dictionary
└─ messages persisted to agents.messages async (for audit)
User sends message
└─ send(session_pid, {:user_message, text, correlation_id})
└─ handle_cast: two-tier classify → route to MCP tool(s)
└─ stream response back via Phoenix Channel (agent:<session_id> topic)
User idle 10+ min OR disconnect
└─ :timeout triggers graceful shutdown
└─ persist remaining state to agents.sessions
└─ GenServer exits
User returns
└─ Orchestrator rehydrates from agents.sessions → new GenServer
Tier 2 — Workflow Agents (Oban, medium-lived)¶
Shape: An Oban job (or chain of jobs) that orchestrates a multi-step business process. Checkpoints state to Postgres between steps. Survives node restarts. Distributes across cluster.
Durability: Hours to weeks. Pay run orchestration might live a few hours; onboarding pipelines run for days to weeks.
Cost profile: Higher per-invocation than Tier 1 (multi-step reasoning), but far lower per-business-outcome because each step is bounded.
Types:
| Agent | Owner OTP app | Workflow |
|---|---|---|
| Onboarding Pipeline | finnest_onboard |
start → DVS verification → credential check → Fair Work forms → super onboarding → induction → placement |
| Super Onboarding Wizard | finnest_onboard (crosses to finnest_payroll + finnest_people) |
TFN declaration → super fund choice (USI lookup + stapled super + SMSF option) → bank details → FWIS acknowledgement → contract signing. Addresses B12 C2 gap |
| Scoring & Matching | finnest_recruit |
batch scoring (deterministic) → AI reasoning only for edge cases (AMBER band) → PROPOSE to human |
| Pay Run Processing | finnest_payroll |
collect timecards → apply awards via AwardInterpreter → compliance check → PROPOSE pay run → human approve → submit STP → generate invoices |
| Incident Response | finnest_safety |
report → classify severity → notify → investigate → corrective action → close |
Checkpoint & resume:
defmodule Finnest.Onboard.Workers.OnboardingPipeline do
use Oban.Worker, queue: :onboard_queue, max_attempts: 5, unique: [fields: [:args], period: 60]
def perform(%Oban.Job{args: %{"pipeline_id" => id, "step" => step}}) do
pipeline = Onboard.PipelineOrchestrator.load(id)
case step do
"dvs_verification" ->
with {:ok, result} <- DocumentVerification.verify(pipeline.documents),
:ok <- Onboard.PipelineOrchestrator.record_result(pipeline, :dvs, result) do
enqueue_next(pipeline, "credential_check")
end
"credential_check" -> ...
"super_onboarding" -> ...
# ... etc
end
end
end
Each step persists its output to onboard.pipeline_steps before enqueueing the next. A node crash mid-step means the job retries (QJ-01 idempotency — same input, same output).
Tier 3 — Autonomous Agents (Oban cron, long-lived)¶
Shape: Always-on cron-triggered jobs. Scan state, detect patterns, PROPOSE actions. Never execute destructive operations autonomously.
Constraint: PROPOSE-only (AI-06). Allowed operations: READ, NOTIFY, REPORT. Forbidden: WRITE, UPDATE, DELETE.
Types:
| Agent | Owner | Schedule | Output |
|---|---|---|---|
| Compliance Monitor | finnest_pulse (operates on finnest_compliance data) |
Nightly 02:00 AEST | Notifications for credential expiries within 30/14/7 days; regulatory change alerts |
| Anomaly Detector | finnest_pulse |
Daily + on pay run finalise | Flags timesheet fraud signals, payroll discrepancies, roster conflicts for human review |
| Roster Optimiser | finnest_roster |
Overnight per org (staggered) | Proposes optimised shift assignments considering availability, skills, compliance, fatigue, cost. Human approves before changes apply |
| Data Quality Agent | finnest_pulse |
Weekly | Incomplete records, stale credentials, data inconsistencies. Creates tasks for humans |
Why PROPOSE-only: An autonomous agent that can mutate state is an autonomous agent that can silently corrupt state. The value is in pattern detection + surfacing; execution stays human.
Infrastructure¶
finnest_agents supervision tree¶
FinnestAgents.Supervisor (one_for_one)
├── FinnestAgents.Orchestrator (singleton GenServer — intent routing, session creation)
├── FinnestAgents.ToolRegistry (GenServer — discovers MCP servers from every domain at boot)
├── FinnestAgents.AgentSupervisor (DynamicSupervisor, restart: :temporary — Tier 1 agents per session)
├── FinnestAgents.ClaudeClient (GenServer — Finch HTTP pool + cost tracking)
├── FinnestAgents.BudgetGuard (GenServer — per-org spend circuit breaker)
├── FinnestAgents.MemoryCoordinator (GenServer — L1/L2/L3 memory read/write routing)
└── FinnestAgents.PromptCache (GenServer — tracks Anthropic prompt-cache metrics)
The Orchestrator — two-tier intent routing¶
User query arrives at Orchestrator
│
├── TIER A: Pattern match (cost $0, latency <5ms)
│ Patterns compiled at boot from finnest_agents/patterns/*.ex
│ Examples:
│ "show (me)? roster" → roster_mcp.list_shifts
│ "leave balance" → people_mcp.get_leave_balance
│ "[first_name] [last_name]" → people_mcp.get_employee
│ "clock (me )?in" → timekeep flow (mobile)
│ Covers ~70% of production queries (B03 cost model)
│
├── TIER B: Deterministic computation (cost $0, latency <50ms)
│ Intent is clear but needs cross-domain composition:
│ "Is John eligible for mining site?" → compliance_mcp.check_credentials
│ "Score candidates for this job order" → recruit_mcp.score_candidates
│ Covers ~15% of production queries
│
└── TIER C: Claude (cost $0.01–0.05, latency 500–2500ms)
Genuine reasoning or generation needed:
Ambiguous intent classification
Natural language generation
Edge-case composition
Covers ~15% of production queries
Routing decision flow:
defmodule FinnestAgents.Orchestrator do
def route(intent_text, %{org_id: org_id, session_id: session_id} = ctx) do
# Tier A
with {:no_match, _} <- FinnestAgents.PatternMatcher.match(intent_text, ctx),
# Tier B
{:no_match, _} <- FinnestAgents.DeterministicResolver.resolve(intent_text, ctx),
# Tier C
:ok <- FinnestAgents.BudgetGuard.check(org_id),
{:ok, classified} <- FinnestAgents.ClaudeClient.classify(intent_text, ctx) do
dispatch(classified, ctx)
else
{:match, tool_call} -> execute_mcp(tool_call, ctx)
{:budget_exceeded, _} -> fallback_local_only(intent_text, ctx)
end
end
end
Tool Registry¶
Discovers and catalogues every MCP server at boot:
defmodule FinnestAgents.ToolRegistry do
def start_link(_) do
GenServer.start_link(__MODULE__, :ok, name: __MODULE__)
end
def init(:ok) do
# Discover every domain with an MCP server
tools =
Application.loaded_applications()
|> Enum.filter(fn {app, _, _} -> app_exposes_mcp?(app) end)
|> Enum.flat_map(&discover_tools/1)
{:ok, %{tools: tools, by_category: index_by_category(tools)}}
end
end
Tool metadata indexed:
- Name (e.g.
roster_list_shifts) - Domain (
:roster) - Category (
:read | :propose | :execute | :restricted) - Input schema (typed fields, required/optional)
- Output schema
- Permission matrix — which roles can invoke; IRAP restrictions
- MCP server pid (if dynamic discovery) or module (if static)
ClaudeClient (hexagonal AiProvider port)¶
defmodule FinnestAgents.AiProvider do
@callback classify(intent :: String.t(), context :: map()) ::
{:ok, classified_intent} | {:error, reason}
@callback generate(messages :: [map()], tools :: [map()], opts :: keyword()) ::
{:ok, response} | {:error, reason}
@callback stream(messages :: [map()], tools :: [map()], opts :: keyword()) ::
Enumerable.t() | {:error, reason}
end
# Adapters (5 total — main doc):
# FinnestAgents.AiProvider.AnthropicDirect (commercial primary)
# FinnestAgents.AiProvider.BedrockSydney (IRAP primary)
# FinnestAgents.AiProvider.VertexAU (Verify fallback only)
# FinnestAgents.AiProvider.MockProvider (tests)
# FinnestAgents.AiProvider.LocalLLMProvider (Phase 3+ — vLLM container)
Cross-cutting concerns in all adapters:
- Finch HTTP pool (connection reuse — saves TLS handshake)
- Per-org rate limit (via
FinnestAgents.BudgetGuard) - Cost tracking per request →
agents.tool_audit - Failover chain per AI-07: primary → fallback → manual review
- Response cache per intent signature (ETS, 5-min TTL) for repeated queries
- Structured logging with correlation ID (AW-14)
Prompt Caching (AI-09 guardrail, Part 3 decision)¶
Anthropic prompt cache delivers 90% discount on cached tokens. Finnest's system prompts + MCP tool schemas are large (~10K tokens) and stable — ideal cache candidates.
Required prompt structure:
[CACHE_BREAKPOINT: PERMANENT]
System prompt (role, principles, formatting rules) ← cached across all sessions
MCP tool schemas (full typed definitions) ← cached across all sessions
[CACHE_BREAKPOINT: PER_ORG]
Org context (industry profiles, terminology, flags) ← cached per-org
[CACHE_BREAKPOINT: NONE]
Session history (last N messages) ← not cached (changes every turn)
User query ← not cached
Claude client enforces this ordering:
defmodule FinnestAgents.AiProvider.AnthropicDirect do
@behaviour FinnestAgents.AiProvider
def build_request(session, user_message, tools) do
%{
model: model_for(session),
system: [
%{type: "text", text: Prompts.base_system(), cache_control: %{type: "ephemeral"}},
%{type: "text", text: Prompts.tool_schemas(tools), cache_control: %{type: "ephemeral"}},
%{type: "text", text: Prompts.org_context(session.org_id), cache_control: %{type: "ephemeral"}}
],
messages: session.history ++ [%{role: "user", content: user_message}]
}
end
end
Observability:
FinnestAgents.PromptCacheGenServer aggregatescache_creation_input_tokens,cache_read_input_tokens,input_tokens,output_tokensper request- Target: ≥70% cache hit rate measured as
cache_read / (cache_read + cache_creation + input)on cacheable content - Dashboard panel; alert if hit rate drops below 50% for 1 hour (likely means someone broke the prompt structure)
Memory System¶
Three levels (B03 infrastructure section), each solving a different problem.
| Level | Storage | Scope | Lifetime | Purpose |
|---|---|---|---|---|
| L1: Session | GenServer state (+ agents.messages async) |
Single conversation | Process lifetime (hydrated on reconnect) | Working context — what the user and agent are discussing right now |
| L2: Tenant | agents.memories (Postgres) |
Per-org | Permanent | Org-specific patterns and preferences — "this client always wants weekend-only workers"; "this org prefers SMS over email for shift confirmations" |
| L3: Domain | agents.memories + event store aggregations |
Platform-wide (anonymised) | Permanent | Industry intelligence — "across all construction orgs, forklift certifications expire X days before detection"; competitive moat data |
L2 memory writes:
- Explicit: agent proposes "remember this preference" → human confirms →
agents.memoriesrow - Implicit: pattern detected across N repeated similar interactions → Tier-3 agent proposes addition
L3 memory writes:
- Aggregations run on
events.domain_eventspartitions nightly - PII stripped; only org_id-free patterns retained
- Used for Tier-½ response generation to benefit from cross-org learning
Memory retrieval during Tier C:
User query → ClaudeClient.generate(...)
└─ system prompt includes:
- base system (cached)
- tool schemas (cached)
- org_context (cached per-org)
- relevant L2 memories (max 5, retrieved by embedding similarity — future phase)
- relevant L3 patterns (max 3)
- session history (uncached)
Governance¶
Tenant Isolation (AI-03, AI-04)¶
See main architecture.md Part 8 and data.md. Key points for agents:
org_idis injected by MCP framework fromsession.metadata— agents cannot set it- Every agent GenServer is spawned with
{session_id, org_id, user_id}in initial state FinnestAgents.AgentSupervisorusesrestart: :temporary— a terminated agent doesn't auto-restart with stale tenant context- Session end = process termination — no reuse across tenants (AI-04)
Action Classification (MCP tool categories)¶
| Category | Effect | Human required | Agent autonomy | Example |
|---|---|---|---|---|
| READ | Query / list / get | No | Full | roster_list_shifts, compliance_check_worker |
| PROPOSE | Agent suggests; human confirms before execute | Yes (before) | Proposal only | roster_propose_assignment, recruit_propose_candidates |
| EXECUTE | Agent acts; human notified after | No (notified) | Full with notification | reach_send_message, timekeep_record_clock_event |
| RESTRICTED | Human must initiate | Yes (must initiate) | None (agent cannot trigger) | payroll_finalise_run, people_terminate_employee |
Per-org defaults: EXECUTE becomes PROPOSE in conservative orgs via feature flag agent_action_mode=strict.
Confidence Framework (AW-12)¶
Every agent response carries a confidence band:
| Band | Threshold (general) | Threshold (compliance) | Action |
|---|---|---|---|
| GREEN | >90% | >95% | Act autonomously (within category allowance); notify after |
| AMBER | 70–90% | 80–95% | Propose action; await human approval |
| RED | <70% | <80% | Flag for human handling; log to review queue |
Compliance-affecting actions have stricter thresholds because error cost is legal.
Budget Management (AI-08)¶
FinnestAgents.BudgetGuard tracks per-org spend in real time:
defmodule FinnestAgents.BudgetGuard do
use GenServer
# State: %{org_id => %{daily: Decimal, weekly: Decimal, monthly: Decimal,
# limits: %{daily: Decimal, weekly: Decimal, monthly: Decimal}}}
def check(org_id) do
case GenServer.call(__MODULE__, {:check, org_id}) do
:ok -> :ok
{:warning, pct} -> {:ok, {:warning, pct}} # 80% threshold
:circuit_breaker_open -> {:error, :budget_exceeded}
end
end
def record_spend(org_id, amount_aud, category) do
GenServer.cast(__MODULE__, {:record, org_id, amount_aud, category})
end
end
Behaviour:
- Accumulated via
cast(non-blocking hot path) - 80% threshold → warning logged; admin notified; UI banner shown to org users
- 100% threshold → circuit breaker opens; subsequent requests fall back to Tier A/B only; admin alerted
- Resets per period (daily/weekly/monthly)
- State persisted to
agents.budget_limitsevery 5 min + on clean shutdown
Audit Trail (AW-14)¶
Every agent action produces:
| Event | Destination | Retention |
|---|---|---|
| User message | agents.messages (full text) |
90 days commercial / 7 years IRAP |
| Agent response | agents.messages (full text) |
Same |
| Tool invocation | agents.tool_audit (tool name, input, output, ms, cost, correlation_id) |
90 days commercial / 7 years IRAP |
| LLM API call | agents.tool_audit (prompt_hash, response_hash, model, tokens, cost, cache_stats, correlation_id) — hash, not plaintext, to avoid PII leak |
Same |
| Business-event emission | events.domain_events (as always) |
7 years both |
Hash, not plaintext, for LLM logs (Commandment #24): prompt content may contain PII. We log a BLAKE2b hash so we can prove the call happened and deduplicate cache hits, without creating a PII exposure surface.
Correlation IDs link: - User message → orchestrator route → MCP tool calls → LLM API call → business events → reactions → further events (AI-05 bounds this at 10)
Cross-Domain Agent Coordination¶
Example: "Onboard John Smith for Woolworths Minchinbury, mining project"
1. Orchestrator classifies: multi-domain onboarding → Tier 2
2. Creates OnboardingPipelineAgent (Oban workflow)
3. Step 1: recruit_mcp.get_candidate("John Smith") [READ]
→ returns candidate data, correlation_id bound
4. Step 2: compliance_mcp.check_credentials(candidate, industry: "mining") [READ]
→ white_card ✓, first_aid ✓, confined_space ✗ (expired)
→ AMBER confidence on "expired" → pipeline pauses for decision
5. Step 3: safety_mcp.check_site_inductions(candidate, "Minchinbury") [READ]
→ site induction not completed
6. Step 4: fatigue_mcp.check_fitness(candidate) [READ]
→ fit for duty ✓
7. Step 5: roster_mcp.check_availability(candidate, date) [READ]
→ available ✓
8. [HUMAN GATE] Present findings via LiveView or mobile notification:
"John Smith available & fit. Needs: confined space recert + site induction.
Approve onboarding with these conditions?"
9. [On approval] Step 6: onboard_mcp.start_onboarding(candidate, conditions) [EXECUTE]
10. Step 7: reach_mcp.send_confirmation(candidate, details) [EXECUTE]
Each step → event → correlation_id links entire chain.
Max 10 events per chain (AI-05) — this chain uses 7, well within budget.
Failure Prevention¶
| Failure mode | Prevention mechanism |
|---|---|
| Every query hits Claude | Two-tier routing (Tier A pattern + Tier B deterministic) catches ~85% before Tier C |
| Agent hallucinates pay rate / award / credential | AwardInterpreter + Compliance.check/2 are deterministic rules engines; agents query and present, never reason over the values (AI-02, B03 Insight 4) |
| Cross-domain infinite loop | Correlation + causation IDs on every event; max 10 events per chain (AI-05); Orchestrator refuses to fire event that would exceed |
| Tenant data leakage | org_id injected at MCP framework layer; per-tenant agent processes; tests verify isolation (AI-03, AI-04) |
| Autonomous destructive action | Tier 3 agents PROPOSE-only; MCP category RESTRICTED requires human initiation (AI-06) |
| Undebuggable behaviour | Every action logged with correlation_id to agents.tool_audit; session messages in agents.messages; replay tool reconstructs decision chains |
| Cost explosion | BudgetGuard per-org circuit breaker with 80% warning and 100% hard stop (AI-08) |
| Prompt injection via user input | System prompts frame user content as untrusted; tool inputs validated at MCP layer via typed schemas (not just free text) |
| Stale cached responses | TTL 5 min; invalidated on relevant domain event (e.g. roster response cache invalidated on shift_updated) |
| Provider outage | Failover chain: primary → fallback → manual review (AI-07). Cost tracked per-provider for visibility |
MCP Tool Definition (gold standard)¶
Every tool follows this shape:
defmodule Finnest.Roster.MCP.Tools.ListShifts do
use FinnestAgents.MCP.Tool,
name: "roster_list_shifts",
domain: :roster,
category: :read,
description: "List shifts for an org within a date range, optionally filtered by site."
input :date_from, :date, required: true, description: "Start of range, inclusive."
input :date_to, :date, required: true, description: "End of range, inclusive."
input :site_id, :uuid, required: false, description: "Optional site filter."
input :status, {:enum, [:scheduled, :in_progress, :completed]}, required: false
output_schema %{
shifts: [%{
id: :uuid,
start_at: :datetime,
end_at: :datetime,
site_id: :uuid,
site_name: :string,
worker_id: {:optional, :uuid},
worker_name: {:optional, :string},
status: :string
}]
}
# org_id injected by MCP framework from session.metadata — NEVER from agent
def call(%{date_from: from, date_to: to} = params, %{org_id: org_id} = _ctx) do
shifts = Finnest.Roster.Queries.list_shifts(org_id,
from: from,
to: to,
site_id: params[:site_id],
status: params[:status]
)
{:ok, %{shifts: Enum.map(shifts, &format_shift/1)}}
end
end
Gold-standard invariants:
- Name is
<domain>_<verb>_<noun>(e.g.roster_list_shifts) - Category is explicit (
:read | :propose | :execute | :restricted) - Input fields are typed with required/optional markers
- Output schema is declared (agent can introspect before calling)
org_idextracted from context, not params — agents can't override- Body delegates to domain
Queries/Commandsmodule — no business logic in the MCP tool PROPOSEtools return a proposal struct, not the side-effectEXECUTEtools call the gated context function (which triggersCompliance.check/2if applicable)
Observability (D19)¶
Per-session telemetry:
- Time to first token (TTFT)
- Total response time
- Tokens (prompt, completion, cached)
- Cost AUD
- MCP tool calls count
- Confidence distribution
Per-org dashboard (Grafana):
- AI spend current day / week / month (budget traffic lights)
- Prompt cache hit rate
- Top intents routed (A/B/C distribution)
- Agent error rate
- Active sessions over time
Alert thresholds:
- Budget 80% of monthly cap → warning
- Budget 100% of monthly cap → critical (circuit breaker armed)
- Cache hit rate <50% for 1 hour → warning (likely prompt restructuring broke caching)
- Tier C rate >25% of total → warning (pattern coverage is slipping)
- TTFT p95 >2s for 5 min → warning (provider issue)
Future Considerations¶
LocalLLMProvider (planned adapter #5, Phase 3+):
Trigger conditions (main arch doc OI-11):
- IRAP Phase 3 deployment wants stronger sovereignty posture (local LLM handles intent classification + PII scrubbing within IRAP VPC)
- Scale exceeds 25K employees or 3 paying clients (GPU amortises)
Provisional plan:
- 8B or 14B model (Llama 3.3 8B / Phi-4 14B / Qwen 2.5 7B candidates)
- vLLM in separate container
- Role: Tier-1.5 — catches ambiguous intents that Tier A pattern-match misses, before falling through to Tier C Claude
- Adapter slots into existing
AiProviderport — no agent code changes required
Federated MCP:
When external agents (customer-built agents, partner agents) need access, the MCP servers become externally callable over JSON-RPC. Current in-process behaviour stays; JSON-RPC becomes a second transport on the same tool definitions.
On-device inference (mobile edge AI):
For privacy-sensitive flows (document classification on the device before upload), explore Core ML / TFLite models shipped with the Flutter app. Deferred — not scoped in 44-week roadmap.