STORY-F-013: AiProvider behaviour + AnthropicDirect + MockProvider adapters¶
Epic: Agent Infrastructure Priority: Must Have Story Points: 3 Status: Not Started Assigned To: Unassigned Created: 2026-04-17 Sprint: 3
User Story¶
As a developer,
I want a hexagonal AiProvider behaviour with working AnthropicDirect and MockProvider adapters, structured for Anthropic prompt caching (AI-09),
so that F-017 Agent Chat can make real Claude calls, tests can stay deterministic, and the 90% prompt-cache discount is captured from day one.
Description¶
Background¶
ADR-006-F + ADR-010 commit to a hexagonal AiProvider port with five planned adapters (AnthropicDirect, BedrockSydney, VertexAU, MockProvider, LocalLLMProvider). This story lands the two needed for Phase 0: AnthropicDirect (commercial default) and MockProvider (tests). BedrockSydney lands with IRAP work (Phase 3); VertexAU with Verify fallback work (Scout+Verify sprints); LocalLLMProvider is Phase 3+.
Prompt caching (AI-09 guardrail, architectural invariant #15) must be structured correctly from the first call — system prompt + tool schemas are permanent-cache candidates; org context is per-org cache; session history + user query are uncached. Getting this right now captures the 90% discount; bolting it on later means drift.
Scope¶
In scope:
FinnestAgents.AiProviderbehaviour with callbacks:@callback classify(intent, context) :: {:ok, classified_intent} | {:error, reason}@callback generate(messages, tools, opts) :: {:ok, response} | {:error, reason}@callback stream(messages, tools, opts) :: Enumerable.t() | {:error, reason}FinnestAgents.AiProvider.AnthropicDirectadapter:- Finch HTTP pool; request to
https://api.anthropic.com/v1/messages - Auth via
ANTHROPIC_API_KEY(Bitwarden) - Prompt caching structure (AI-09):
- System prompt first part: static system prompt (
cache_control: {type: "ephemeral"}) - System prompt second part: MCP tool schemas JSON (
cache_control: {type: "ephemeral"}) - System prompt third part: per-org context (
cache_control: {type: "ephemeral"}) - Messages: user message + session history (NO cache_control)
- System prompt first part: static system prompt (
- Claude model:
claude-sonnet-4-6default; configurable per-task generate/3returns structured%FinnestAgents.Response{content, tool_calls, tokens, cache_stats, cost_aud, model}stream/3returns Enumerable yielding{:token, text},{:tool_call, %{...}},{:complete, response}events- Cost calculation: input tokens × model_input_rate + output tokens × model_output_rate; cached tokens at 10% of normal rate
FinnestAgents.AiProvider.MockProvideradapter:- Deterministic responses keyed by input hash
- Cost = 0; tokens synthesised
- Used in all tests not explicitly integration-gated
FinnestAgents.PromptBuilder— helper module:build/3assembles system prompt parts + org context + history + user message in the correct cache-breakpoint orderFinnestAgents.PromptCacheGenServer — aggregatescache_creation_input_tokens+cache_read_input_tokensper request; exposes hit-rate metric (target ≥70%)- Replace
ClaudeClientplaceholder in Agents Supervisor (F-012) with real AnthropicDirect adapter config/runtime.exs:config :finnest, :ai_provider, FinnestAgents.AiProvider.AnthropicDirect(commercial);MockProvider(test)- Architecture test
ai_provider_port_test.exs: both adapters implement the behaviour contract - Architecture test
prompt_cache_structure_test.exs:PromptBuilder.build/3output hascache_controlon system parts, absent on user messages (regression protection for AI-09)
Out of scope:
- BedrockSydney adapter (Phase 3 IRAP)
- VertexAU adapter (Scout+Verify sprints)
- LocalLLMProvider (Phase 3+)
- Failover chain logic (add when BedrockSydney + VertexAU exist)
- Full cost dashboard (F-020 or later)
- Response caching in ETS (Scout+Verify sprints)
Technical Notes¶
- Module namespace:
FinnestAgents.*(flat), notFinnest.Agents.*(dotted) — see F-012 Technical Notes for the Boundary rationale. AllAiProvider,AiProvider.AnthropicDirect,AiProvider.MockProvider,PromptBuilder,PromptCache,Responsemodules nest underFinnestAgents. - Default model:
claude-sonnet-4-6as the standing default for Tier C traffic (classification + routine generation — good speed/cost). Per-task override toclaude-opus-4-7(or later) for high-stakes reasoning: ambiguous intent classification, edge-case composition, compliance-sensitive generation. Caller selects viaopts[:model]ingenerate/3andstream/3. Haiku intentionally not wired in F-013 — revisit if Tier B fallback needs an even-cheaper router tier later. - Finch config: pool size 5 per
AnthropicDirect; timeout 30s (long for streaming); keep-alive 60s - Claude message format: see Anthropic API docs for
cache_controlbeta header - Required header for caching:
anthropic-beta: prompt-caching-2024-07-31(check current spec at story pickup — may have changed) - Token counting for cost: Anthropic returns
usage.input_tokens,usage.output_tokens,usage.cache_creation_input_tokens,usage.cache_read_input_tokens— sum intocost_audvia rate table (store in config) - Streaming via Server-Sent Events; Finch supports streaming bodies; parse incrementally
- PromptBuilder must produce identical system prompt bytes across calls for cache hits (same ordering, same tools, same context). Stable ordering is the crucial detail.
- MockProvider should have a hook for tests to inject specific responses:
MockProvider.expect(input_pattern, response) - Never log prompt content plaintext (PII risk); log prompt_hash only (Commandment #24)
- AI-03/AI-04 tenant injection:
context.org_idcomes from session — never from user input. PromptBuilder asserts org_id presence.
Dependencies¶
- Blocked by: STORY-F-012 (Agents Supervisor + BudgetGuard), STORY-F-005 (ANTHROPIC_API_KEY in Bitwarden)
Acceptance Criteria¶
-
FinnestAgents.AiProviderbehaviour defined with 3 callbacks; documented with typespecs -
AnthropicDirectadapter makes a real round-trip call to Anthropic's/v1/messagesand returns parsed response - Prompt cache integration: system prompt + tool schemas appear in Anthropic response's
cache_creation_input_tokenson first call;cache_read_input_tokenson subsequent calls within TTL - Cache hit rate >50% by the 3rd call in a session (given stable system+tools+org context)
-
MockProviderdeterministic;MockProvider.expect/2+reset/0work for test setup/teardown -
PromptBuilder.build/3places static parts FIRST withcache_control, dynamic parts LAST without — asserted by unit test -
PromptCacheGenServer reports hit-rate metric; exposed viaget_stats/0 -
generate/3returns%Response{}with populated tokens, cache_stats, cost_aud -
stream/3returns working Enumerable yielding{:token, _}events - BudgetGuard (F-012) integration: AnthropicDirect calls
BudgetGuard.check(org_id)before API call; rejects with:budget_exceededif over limit - BudgetGuard.record_spend called after each successful call
- Architecture tests pass (adapter contract + prompt structure)
- Prompt content never appears in logs — only hash
-
mix format,credo --strict,dialyzer,mix boundaryall green
Testing Requirements¶
- Unit: PromptBuilder structure; MockProvider determinism; PromptCache aggregation
- Integration (gated by
ANTHROPIC_LIVE_TEST=true): real call to Anthropic returns expected shape; cache behaviour observable - Property: random inputs → MockProvider always returns
{:ok, _}or{:error, _}; no raises - Budget test: set org budget to 0 → AnthropicDirect refuses call with budget error; no network call made
References¶
../architecture/agents.md§ClaudeClient, §Prompt Caching../adrs/adr-006-F-hexagonal-ports-for-external-integrations.md../adrs/adr-0010-australian-data-residency-ai-providers.md(foundation, Bedrock comes later)../10-GUARDRAILS.mdAR-14, AR-15, AI-07, AI-08, AI-09- Architectural invariant #15 — prompt-cache-optimal structure
- Anthropic API docs: https://docs.anthropic.com/en/api/messages