Brainstorm Session 3: AI Agent Design¶
Date: 2026-04-15 Objective: Design how AI agents work within the Supervised Modular Monolith Depends on: Session 1 (Elixir/Phoenix), Session 2 (20 OTP apps, MCP boundaries)
Techniques Used¶
- Six Thinking Hats — Examine agent architecture from all perspectives
- Mind Mapping — Hierarchical exploration of the agent system
- Reverse Brainstorming — "How would our agent architecture guarantee failure?"
Core Design: Three-Tier Agent Architecture¶
Tier 1: Conversational Agents (GenServer, short-lived)¶
- User-facing chat agents per domain
- Scoped to a session, hold conversation state in GenServer
- Die when user disconnects, restart if they return
- Intent classification: pattern match (free) → Claude fallback (~$0.01)
- Tool calling via MCP servers
- UI rendering (tables, stat cards, charts via RenderUI tool)
Agent Types: - User Chat Agent — general-purpose, routes to domain agents - Reach Agent — SMS/voice/chat/email for workers and clients - Admin Assistant — natural language queries over dashboards and reports
Tier 2: Workflow Agents (Oban, medium-lived)¶
- Multi-step business processes (hours to weeks)
- Checkpoint state to PostgreSQL via Oban workers
- Survive node restarts, distribute across cluster
- Human-in-the-loop approval gates
- Escalation on failure/timeout
Agent Types: - Onboarding Pipeline — start → docs → verify → induct → place (days/weeks) - Scoring & Matching — batch scoring + AI reasoning for edge cases - Payroll Processing — calculate → validate → review → approve → submit - Incident Response — report → classify → notify → investigate → resolve
Tier 3: Autonomous Agents (Oban Cron, long-lived)¶
- Always-on monitoring and analysis
- Scheduled or event-triggered execution
- PROPOSE-only — never take destructive action without human approval
- Only allowed: READ, NOTIFY, REPORT
Agent Types: - Compliance Monitor — credential expiry, award rate changes, regulatory updates - Anomaly Detector — timesheet fraud signals, payroll discrepancies, roster conflicts - Roster Optimiser — overnight optimisation considering availability, skills, compliance, fatigue, cost - Data Quality Agent — incomplete records, stale credentials, data inconsistencies
Infrastructure¶
Orchestrator¶
- Receives all user/system intents
- Two-tier classification:
- Tier 1: Pattern matching on keywords → domain. Cost: $0
- Tier 2: Claude classification for ambiguous intents. Cost: ~$0.01
- Creates/retrieves appropriate agent for the domain
- Manages agent lifecycle (create, monitor, terminate)
- Cross-domain coordination for multi-step intents
MCP Server Registry¶
- Catalog of all 18 domain MCP servers
- Typed tool schemas (inputs/outputs/descriptions)
- Permission matrix (which agents can call which tools)
- IRAP restrictions (defence tenants: restricted tool set)
- Dynamic registration (new domain → auto-registered)
Claude Client (Provider Abstraction — Hexagonal)¶
Port: AiProvider behaviour
├── AnthropicDirect adapter (commercial deployment)
├── BedrockSydney adapter (IRAP deployment — Claude in ap-southeast-2)
└── MockProvider adapter (testing — no API calls)
- Connection pooling (Finch HTTP client)
- Rate limiting (per-org, per-minute)
- Cost tracking (per-request, per-org, per-agent)
- Failover chain (primary → fallback → manual escalation)
- Response caching for repeated queries
Memory System¶
| Level | Storage | Scope | Duration | Purpose |
|---|---|---|---|---|
| L1: Session | GenServer state | Single conversation | Process lifetime | Chat context |
| L2: Tenant | PostgreSQL | Per-organisation | Permanent | Org-specific patterns and preferences |
| L3: Domain | Event store | Platform-wide (anonymised) | Permanent | Industry intelligence, competitive moat |
Governance¶
Tenant Isolation¶
- Every agent query scoped to org_id
- Org_id injected at MCP tool level (infrastructure, not agent)
- Agent processes NEVER reused across tenants
- Session end = process termination = clean context
- Architecture tests verify isolation
Action Classification¶
| Class | Description | Human Required | Example |
|---|---|---|---|
| READ | Query, search, report | No | "Show today's roster" |
| PROPOSE | Agent suggests, human confirms | Yes (before) | "Create this job order?" |
| EXECUTE | Agent acts with notification | No (notified after) | Send shift confirmation SMS |
| RESTRICTED | Human must initiate | Yes (must initiate) | Delete record, financial transaction, compliance decision |
Per-domain defaults, per-org overrides via feature flags.
Confidence Framework¶
| Level | Threshold | Action | Compliance Domains |
|---|---|---|---|
| GREEN | >90% confidence | Act autonomously, notify after | >95% |
| AMBER | 70-90% confidence | Propose action, await approval | 80-95% |
| RED | <70% confidence | Flag for human handling | <80% |
Budget Management¶
- Per-org daily/weekly/monthly AI spend limits
- 80% threshold: warning log
- 100% threshold: circuit breaker (blocks AI calls, local-only mode)
- Local-first routing reduces cost by ~70%
- Response caching for repeated queries
- Cost attribution: per-agent, per-domain, per-org
Audit Trail¶
- Every agent action → event in event store
- Every LLM call logged (prompt hash, response, tokens, cost, latency, model)
- Every tool call logged (tool name, input, output, org_id, correlation_id)
- Every human override logged (agent proposal vs human decision)
- Immutable, tamper-evident (IRAP PROTECTED requirement)
- 7-year retention
- Correlation IDs trace entire decision chains
Local-First Routing (Cost Optimization)¶
User query arrives at Orchestrator
│
├── Pattern match against known intents
│ ├── "show roster" → roster_mcp.list_shifts() → $0
│ ├── "leave balance" → people_mcp.get_leave_balance() → $0
│ ├── "John Smith details" → people_mcp.get_employee() → $0
│ └── [70%+ of queries handled here]
│
├── Deterministic computation
│ ├── "Is John eligible?" → compliance_mcp.check_credentials() → $0
│ ├── "Score candidates" → recruit_mcp.score_candidates() → $0
│ └── [15%+ of queries handled here]
│
└── Claude API (genuine reasoning needed)
├── Ambiguous intent classification → ~$0.01
├── Natural language generation → ~$0.02
├── Edge-case reasoning → ~$0.03
└── [~15% of queries land here]
Estimated cost at 5,000 employees, 200 queries/day:
Without local-first: ~$15,000/month
With local-first: ~$4,000/month (73% reduction)
Cross-Domain Agent Coordination¶
Example: "Onboard John Smith for Woolworths Minchinbury, mining project"¶
Orchestrator receives intent
│
├── Classifies: multi-domain onboarding workflow → Tier 2
│
├── Creates Onboarding Pipeline Agent (Oban workflow)
│
├── Step 1: recruit_mcp.get_candidate("John Smith")
│ └── Returns candidate data
│
├── Step 2: compliance_mcp.check_credentials(candidate, industry: "mining")
│ └── Returns: white_card ✓, first_aid ✓, confined_space ✗ (expired)
│ └── AMBER: Proposes "Schedule confined space recertification"
│
├── Step 3: safety_mcp.check_site_inductions(candidate, "Minchinbury")
│ └── Returns: site induction not completed
│ └── PROPOSE: "Schedule site induction for Minchinbury"
│
├── Step 4: fatigue_mcp.check_fitness(candidate)
│ └── Returns: fit for duty ✓
│
├── Step 5: roster_mcp.check_availability(candidate, date)
│ └── Returns: available ✓
│
├── [HUMAN GATE]: Present findings for approval
│ └── "John Smith is available and fit. Needs: confined space recert + site induction.
│ Approve onboarding with these conditions?"
│
├── Step 6 (after approval): onboard_mcp.start_onboarding(candidate, conditions)
│
└── Step 7: reach_mcp.send_confirmation(candidate, details)
Each step: logged as event, correlation_id links entire chain, checkpoint after each step, resumable on failure.
Failure Prevention¶
| Failure Mode | Prevention |
|---|---|
| Every query hits Claude | Local-first routing: 70%+ handled at $0 |
| Agent hallucinates pay rate | Deterministic award engine. Agents query but never override. |
| Cross-domain infinite loop | Correlation IDs + causation chains. Max 10 events per chain. |
| Tenant data leakage | Per-tenant agent processes. Org_id at infrastructure level. |
| Autonomous destructive action | Tier 3 agents are PROPOSE-only. No WRITE/UPDATE/DELETE. |
| Undebuggable agent behavior | Every action in event store with correlation_id. Replay tool. |
| Cost explosion | Per-org budget limits with circuit breaker. Local-first routing. |
Key Insights¶
Insight 1: Three Tiers Solve Three Different Problems¶
Conversational (GenServer) for user interaction. Workflow (Oban) for multi-step processes. Autonomous (Oban cron) for monitoring. Different durability, cost, and oversight per tier. Impact: High | Effort: Medium
Insight 2: Local-First Routing Cuts AI Cost by 70%+¶
Pattern matching + database queries handle majority at \(0. Claude reserved for reasoning and generation. ~\)4K vs ~$15K/month at 5,000 employees. Impact: High | Effort: Low
Insight 3: MCP Tool Layer Is the Agent-Domain Contract¶
Every domain exposes typed MCP tools. Agents consume through uniform interface. New domains auto-accessible. Domains rewritable without touching agents. Impact: High | Effort: Medium
Insight 4: Compliance Domains Need Deterministic Guardrails, Not AI Judgment¶
Pay rates, awards, credentials, WHS = deterministic rules engines. Agents query and present, never override. Legal liability prevention. Impact: High | Effort: Low
Insight 5: Data Flywheel Starts with Agent Audit Logs¶
Every interaction logged: queries, actions, corrections. Over time reveals patterns that become competitive moat. Natural byproduct of IRAP audit trail. Impact: High | Effort: Low
Insight 6: Agent Guardrails Are OTP Supervisors¶
Guardrails as supervisors that own agent processes. Budget exceeded → supervisor kills agent, restarts in restricted mode. Fail-safe by default. Impact: Medium | Effort: Low
Statistics¶
- Total ideas: 35+
- Categories: 5 (Agent Types, Infrastructure, Governance, Cost Optimization, Failure Prevention)
- Key insights: 6
- Techniques applied: 3
Recommended Next Steps¶
→ Session 4: Data Model & Migration (entity design informed by 18 modules + agent data needs) → Prototype: Build MCP servers for 3 domains (recruit, roster, people) in Elixir PoC → Cost model: Calculate AI costs per tier at 5K, 50K, 500K employees
Generated by BMAD Method v6 - Creative Intelligence