Skip to content

Brainstorm Session 3: AI Agent Design

Date: 2026-04-15 Objective: Design how AI agents work within the Supervised Modular Monolith Depends on: Session 1 (Elixir/Phoenix), Session 2 (20 OTP apps, MCP boundaries)

Techniques Used

  1. Six Thinking Hats — Examine agent architecture from all perspectives
  2. Mind Mapping — Hierarchical exploration of the agent system
  3. Reverse Brainstorming — "How would our agent architecture guarantee failure?"

Core Design: Three-Tier Agent Architecture

Tier 1: Conversational Agents (GenServer, short-lived)

  • User-facing chat agents per domain
  • Scoped to a session, hold conversation state in GenServer
  • Die when user disconnects, restart if they return
  • Intent classification: pattern match (free) → Claude fallback (~$0.01)
  • Tool calling via MCP servers
  • UI rendering (tables, stat cards, charts via RenderUI tool)

Agent Types: - User Chat Agent — general-purpose, routes to domain agents - Reach Agent — SMS/voice/chat/email for workers and clients - Admin Assistant — natural language queries over dashboards and reports

Tier 2: Workflow Agents (Oban, medium-lived)

  • Multi-step business processes (hours to weeks)
  • Checkpoint state to PostgreSQL via Oban workers
  • Survive node restarts, distribute across cluster
  • Human-in-the-loop approval gates
  • Escalation on failure/timeout

Agent Types: - Onboarding Pipeline — start → docs → verify → induct → place (days/weeks) - Scoring & Matching — batch scoring + AI reasoning for edge cases - Payroll Processing — calculate → validate → review → approve → submit - Incident Response — report → classify → notify → investigate → resolve

Tier 3: Autonomous Agents (Oban Cron, long-lived)

  • Always-on monitoring and analysis
  • Scheduled or event-triggered execution
  • PROPOSE-only — never take destructive action without human approval
  • Only allowed: READ, NOTIFY, REPORT

Agent Types: - Compliance Monitor — credential expiry, award rate changes, regulatory updates - Anomaly Detector — timesheet fraud signals, payroll discrepancies, roster conflicts - Roster Optimiser — overnight optimisation considering availability, skills, compliance, fatigue, cost - Data Quality Agent — incomplete records, stale credentials, data inconsistencies


Infrastructure

Orchestrator

  • Receives all user/system intents
  • Two-tier classification:
  • Tier 1: Pattern matching on keywords → domain. Cost: $0
  • Tier 2: Claude classification for ambiguous intents. Cost: ~$0.01
  • Creates/retrieves appropriate agent for the domain
  • Manages agent lifecycle (create, monitor, terminate)
  • Cross-domain coordination for multi-step intents

MCP Server Registry

  • Catalog of all 18 domain MCP servers
  • Typed tool schemas (inputs/outputs/descriptions)
  • Permission matrix (which agents can call which tools)
  • IRAP restrictions (defence tenants: restricted tool set)
  • Dynamic registration (new domain → auto-registered)

Claude Client (Provider Abstraction — Hexagonal)

Port: AiProvider behaviour
├── AnthropicDirect adapter (commercial deployment)
├── BedrockSydney adapter (IRAP deployment — Claude in ap-southeast-2)
└── MockProvider adapter (testing — no API calls)
  • Connection pooling (Finch HTTP client)
  • Rate limiting (per-org, per-minute)
  • Cost tracking (per-request, per-org, per-agent)
  • Failover chain (primary → fallback → manual escalation)
  • Response caching for repeated queries

Memory System

Level Storage Scope Duration Purpose
L1: Session GenServer state Single conversation Process lifetime Chat context
L2: Tenant PostgreSQL Per-organisation Permanent Org-specific patterns and preferences
L3: Domain Event store Platform-wide (anonymised) Permanent Industry intelligence, competitive moat

Governance

Tenant Isolation

  • Every agent query scoped to org_id
  • Org_id injected at MCP tool level (infrastructure, not agent)
  • Agent processes NEVER reused across tenants
  • Session end = process termination = clean context
  • Architecture tests verify isolation

Action Classification

Class Description Human Required Example
READ Query, search, report No "Show today's roster"
PROPOSE Agent suggests, human confirms Yes (before) "Create this job order?"
EXECUTE Agent acts with notification No (notified after) Send shift confirmation SMS
RESTRICTED Human must initiate Yes (must initiate) Delete record, financial transaction, compliance decision

Per-domain defaults, per-org overrides via feature flags.

Confidence Framework

Level Threshold Action Compliance Domains
GREEN >90% confidence Act autonomously, notify after >95%
AMBER 70-90% confidence Propose action, await approval 80-95%
RED <70% confidence Flag for human handling <80%

Budget Management

  • Per-org daily/weekly/monthly AI spend limits
  • 80% threshold: warning log
  • 100% threshold: circuit breaker (blocks AI calls, local-only mode)
  • Local-first routing reduces cost by ~70%
  • Response caching for repeated queries
  • Cost attribution: per-agent, per-domain, per-org

Audit Trail

  • Every agent action → event in event store
  • Every LLM call logged (prompt hash, response, tokens, cost, latency, model)
  • Every tool call logged (tool name, input, output, org_id, correlation_id)
  • Every human override logged (agent proposal vs human decision)
  • Immutable, tamper-evident (IRAP PROTECTED requirement)
  • 7-year retention
  • Correlation IDs trace entire decision chains

Local-First Routing (Cost Optimization)

User query arrives at Orchestrator
  ├── Pattern match against known intents
  │   ├── "show roster" → roster_mcp.list_shifts() → $0
  │   ├── "leave balance" → people_mcp.get_leave_balance() → $0
  │   ├── "John Smith details" → people_mcp.get_employee() → $0
  │   └── [70%+ of queries handled here]
  ├── Deterministic computation
  │   ├── "Is John eligible?" → compliance_mcp.check_credentials() → $0
  │   ├── "Score candidates" → recruit_mcp.score_candidates() → $0
  │   └── [15%+ of queries handled here]
  └── Claude API (genuine reasoning needed)
      ├── Ambiguous intent classification → ~$0.01
      ├── Natural language generation → ~$0.02
      ├── Edge-case reasoning → ~$0.03
      └── [~15% of queries land here]

Estimated cost at 5,000 employees, 200 queries/day:
  Without local-first: ~$15,000/month
  With local-first:    ~$4,000/month (73% reduction)

Cross-Domain Agent Coordination

Example: "Onboard John Smith for Woolworths Minchinbury, mining project"

Orchestrator receives intent
  ├── Classifies: multi-domain onboarding workflow → Tier 2
  ├── Creates Onboarding Pipeline Agent (Oban workflow)
  ├── Step 1: recruit_mcp.get_candidate("John Smith")
  │   └── Returns candidate data
  ├── Step 2: compliance_mcp.check_credentials(candidate, industry: "mining")
  │   └── Returns: white_card ✓, first_aid ✓, confined_space ✗ (expired)
  │   └── AMBER: Proposes "Schedule confined space recertification"
  ├── Step 3: safety_mcp.check_site_inductions(candidate, "Minchinbury")
  │   └── Returns: site induction not completed
  │   └── PROPOSE: "Schedule site induction for Minchinbury"
  ├── Step 4: fatigue_mcp.check_fitness(candidate)
  │   └── Returns: fit for duty ✓
  ├── Step 5: roster_mcp.check_availability(candidate, date)
  │   └── Returns: available ✓
  ├── [HUMAN GATE]: Present findings for approval
  │   └── "John Smith is available and fit. Needs: confined space recert + site induction.
  │        Approve onboarding with these conditions?"
  ├── Step 6 (after approval): onboard_mcp.start_onboarding(candidate, conditions)
  └── Step 7: reach_mcp.send_confirmation(candidate, details)

Each step: logged as event, correlation_id links entire chain, checkpoint after each step, resumable on failure.


Failure Prevention

Failure Mode Prevention
Every query hits Claude Local-first routing: 70%+ handled at $0
Agent hallucinates pay rate Deterministic award engine. Agents query but never override.
Cross-domain infinite loop Correlation IDs + causation chains. Max 10 events per chain.
Tenant data leakage Per-tenant agent processes. Org_id at infrastructure level.
Autonomous destructive action Tier 3 agents are PROPOSE-only. No WRITE/UPDATE/DELETE.
Undebuggable agent behavior Every action in event store with correlation_id. Replay tool.
Cost explosion Per-org budget limits with circuit breaker. Local-first routing.

Key Insights

Insight 1: Three Tiers Solve Three Different Problems

Conversational (GenServer) for user interaction. Workflow (Oban) for multi-step processes. Autonomous (Oban cron) for monitoring. Different durability, cost, and oversight per tier. Impact: High | Effort: Medium

Insight 2: Local-First Routing Cuts AI Cost by 70%+

Pattern matching + database queries handle majority at \(0. Claude reserved for reasoning and generation. ~\)4K vs ~$15K/month at 5,000 employees. Impact: High | Effort: Low

Insight 3: MCP Tool Layer Is the Agent-Domain Contract

Every domain exposes typed MCP tools. Agents consume through uniform interface. New domains auto-accessible. Domains rewritable without touching agents. Impact: High | Effort: Medium

Insight 4: Compliance Domains Need Deterministic Guardrails, Not AI Judgment

Pay rates, awards, credentials, WHS = deterministic rules engines. Agents query and present, never override. Legal liability prevention. Impact: High | Effort: Low

Insight 5: Data Flywheel Starts with Agent Audit Logs

Every interaction logged: queries, actions, corrections. Over time reveals patterns that become competitive moat. Natural byproduct of IRAP audit trail. Impact: High | Effort: Low

Insight 6: Agent Guardrails Are OTP Supervisors

Guardrails as supervisors that own agent processes. Budget exceeded → supervisor kills agent, restarts in restricted mode. Fail-safe by default. Impact: Medium | Effort: Low


Statistics

  • Total ideas: 35+
  • Categories: 5 (Agent Types, Infrastructure, Governance, Cost Optimization, Failure Prevention)
  • Key insights: 6
  • Techniques applied: 3

→ Session 4: Data Model & Migration (entity design informed by 18 modules + agent data needs) → Prototype: Build MCP servers for 3 domains (recruit, roster, people) in Elixir PoC → Cost model: Calculate AI costs per tier at 5K, 50K, 500K employees


Generated by BMAD Method v6 - Creative Intelligence