STORY-F-013: `AiProvider` behaviour + AnthropicDirect + MockProvider adapters¶

Epic: Agent Infrastructure Priority: Must Have Story Points: 3 Status: Not Started Assigned To: Unassigned Created: 2026-04-17 Sprint: 3

User Story¶

As a developer, I want a hexagonal AiProvider behaviour with working AnthropicDirect and MockProvider adapters, structured for Anthropic prompt caching (AI-09), so that F-017 Agent Chat can make real Claude calls, tests can stay deterministic, and the 90% prompt-cache discount is captured from day one.

Description¶

Background¶

ADR-006-F + ADR-010 commit to a hexagonal AiProvider port with five planned adapters (AnthropicDirect, BedrockSydney, VertexAU, MockProvider, LocalLLMProvider). This story lands the two needed for Phase 0: AnthropicDirect (commercial default) and MockProvider (tests). BedrockSydney lands with IRAP work (Phase 3); VertexAU with Verify fallback work (Scout+Verify sprints); LocalLLMProvider is Phase 3+.

Prompt caching (AI-09 guardrail, architectural invariant #15) must be structured correctly from the first call — system prompt + tool schemas are permanent-cache candidates; org context is per-org cache; session history + user query are uncached. Getting this right now captures the 90% discount; bolting it on later means drift.

Scope¶

In scope:

FinnestAgents.AiProvider behaviour with callbacks:
@callback classify(intent, context) :: {:ok, classified_intent} | {:error, reason}
@callback generate(messages, tools, opts) :: {:ok, response} | {:error, reason}
@callback stream(messages, tools, opts) :: Enumerable.t() | {:error, reason}
FinnestAgents.AiProvider.AnthropicDirect adapter:
Finch HTTP pool; request to https://api.anthropic.com/v1/messages
Auth via ANTHROPIC_API_KEY (Bitwarden)
Prompt caching structure (AI-09):
- System prompt first part: static system prompt (cache_control: {type: "ephemeral"})
- System prompt second part: MCP tool schemas JSON (cache_control: {type: "ephemeral"})
- System prompt third part: per-org context (cache_control: {type: "ephemeral"})
- Messages: user message + session history (NO cache_control)
Claude model: claude-sonnet-4-6 default; configurable per-task
generate/3 returns structured %FinnestAgents.Response{content, tool_calls, tokens, cache_stats, cost_aud, model}
stream/3 returns Enumerable yielding {:token, text}, {:tool_call, %{...}}, {:complete, response} events
Cost calculation: input tokens × model_input_rate + output tokens × model_output_rate; cached tokens at 10% of normal rate
FinnestAgents.AiProvider.MockProvider adapter:
Deterministic responses keyed by input hash
Cost = 0; tokens synthesised
Used in all tests not explicitly integration-gated
FinnestAgents.PromptBuilder — helper module: build/3 assembles system prompt parts + org context + history + user message in the correct cache-breakpoint order
FinnestAgents.PromptCache GenServer — aggregates cache_creation_input_tokens + cache_read_input_tokens per request; exposes hit-rate metric (target ≥70%)
Replace ClaudeClient placeholder in Agents Supervisor (F-012) with real AnthropicDirect adapter
config/runtime.exs: config :finnest, :ai_provider, FinnestAgents.AiProvider.AnthropicDirect (commercial); MockProvider (test)
Architecture test ai_provider_port_test.exs: both adapters implement the behaviour contract
Architecture test prompt_cache_structure_test.exs: PromptBuilder.build/3 output has cache_control on system parts, absent on user messages (regression protection for AI-09)

Out of scope:

BedrockSydney adapter (Phase 3 IRAP)
VertexAU adapter (Scout+Verify sprints)
LocalLLMProvider (Phase 3+)
Failover chain logic (add when BedrockSydney + VertexAU exist)
Full cost dashboard (F-020 or later)
Response caching in ETS (Scout+Verify sprints)

Technical Notes¶

Module namespace: FinnestAgents.* (flat), not Finnest.Agents.* (dotted) — see F-012 Technical Notes for the Boundary rationale. All AiProvider, AiProvider.AnthropicDirect, AiProvider.MockProvider, PromptBuilder, PromptCache, Response modules nest under FinnestAgents.
Default model: claude-sonnet-4-6 as the standing default for Tier C traffic (classification + routine generation — good speed/cost). Per-task override to claude-opus-4-7 (or later) for high-stakes reasoning: ambiguous intent classification, edge-case composition, compliance-sensitive generation. Caller selects via opts[:model] in generate/3 and stream/3. Haiku intentionally not wired in F-013 — revisit if Tier B fallback needs an even-cheaper router tier later.
Finch config: pool size 5 per AnthropicDirect; timeout 30s (long for streaming); keep-alive 60s
Claude message format: see Anthropic API docs for cache_control beta header
Required header for caching: anthropic-beta: prompt-caching-2024-07-31 (check current spec at story pickup — may have changed)
Token counting for cost: Anthropic returns usage.input_tokens, usage.output_tokens, usage.cache_creation_input_tokens, usage.cache_read_input_tokens — sum into cost_aud via rate table (store in config)
Streaming via Server-Sent Events; Finch supports streaming bodies; parse incrementally
PromptBuilder must produce identical system prompt bytes across calls for cache hits (same ordering, same tools, same context). Stable ordering is the crucial detail.
MockProvider should have a hook for tests to inject specific responses: MockProvider.expect(input_pattern, response)
Never log prompt content plaintext (PII risk); log prompt_hash only (Commandment #24)
AI-03/AI-04 tenant injection: context.org_id comes from session — never from user input. PromptBuilder asserts org_id presence.

Dependencies¶

Blocked by: STORY-F-012 (Agents Supervisor + BudgetGuard), STORY-F-005 (ANTHROPIC_API_KEY in Bitwarden)

Acceptance Criteria¶

Testing Requirements¶

Unit: PromptBuilder structure; MockProvider determinism; PromptCache aggregation
Integration (gated by ANTHROPIC_LIVE_TEST=true): real call to Anthropic returns expected shape; cache behaviour observable
Property: random inputs → MockProvider always returns {:ok, _} or {:error, _}; no raises
Budget test: set org budget to 0 → AnthropicDirect refuses call with budget error; no network call made

References¶

../architecture/agents.md §ClaudeClient, §Prompt Caching
../adrs/adr-006-F-hexagonal-ports-for-external-integrations.md
../adrs/adr-0010-australian-data-residency-ai-providers.md (foundation, Bedrock comes later)
../10-GUARDRAILS.md AR-14, AR-15, AI-07, AI-08, AI-09
Architectural invariant #15 — prompt-cache-optimal structure
Anthropic API docs: https://docs.anthropic.com/en/api/messages

STORY-F-013: AiProvider behaviour + AnthropicDirect + MockProvider adapters¶