Skip to content

STORY-F-013: AiProvider behaviour + AnthropicDirect + MockProvider adapters

Epic: Agent Infrastructure Priority: Must Have Story Points: 3 Status: Not Started Assigned To: Unassigned Created: 2026-04-17 Sprint: 3


User Story

As a developer, I want a hexagonal AiProvider behaviour with working AnthropicDirect and MockProvider adapters, structured for Anthropic prompt caching (AI-09), so that F-017 Agent Chat can make real Claude calls, tests can stay deterministic, and the 90% prompt-cache discount is captured from day one.


Description

Background

ADR-006-F + ADR-010 commit to a hexagonal AiProvider port with five planned adapters (AnthropicDirect, BedrockSydney, VertexAU, MockProvider, LocalLLMProvider). This story lands the two needed for Phase 0: AnthropicDirect (commercial default) and MockProvider (tests). BedrockSydney lands with IRAP work (Phase 3); VertexAU with Verify fallback work (Scout+Verify sprints); LocalLLMProvider is Phase 3+.

Prompt caching (AI-09 guardrail, architectural invariant #15) must be structured correctly from the first call — system prompt + tool schemas are permanent-cache candidates; org context is per-org cache; session history + user query are uncached. Getting this right now captures the 90% discount; bolting it on later means drift.

Scope

In scope:

  • FinnestAgents.AiProvider behaviour with callbacks:
  • @callback classify(intent, context) :: {:ok, classified_intent} | {:error, reason}
  • @callback generate(messages, tools, opts) :: {:ok, response} | {:error, reason}
  • @callback stream(messages, tools, opts) :: Enumerable.t() | {:error, reason}
  • FinnestAgents.AiProvider.AnthropicDirect adapter:
  • Finch HTTP pool; request to https://api.anthropic.com/v1/messages
  • Auth via ANTHROPIC_API_KEY (Bitwarden)
  • Prompt caching structure (AI-09):
    • System prompt first part: static system prompt (cache_control: {type: "ephemeral"})
    • System prompt second part: MCP tool schemas JSON (cache_control: {type: "ephemeral"})
    • System prompt third part: per-org context (cache_control: {type: "ephemeral"})
    • Messages: user message + session history (NO cache_control)
  • Claude model: claude-sonnet-4-6 default; configurable per-task
  • generate/3 returns structured %FinnestAgents.Response{content, tool_calls, tokens, cache_stats, cost_aud, model}
  • stream/3 returns Enumerable yielding {:token, text}, {:tool_call, %{...}}, {:complete, response} events
  • Cost calculation: input tokens × model_input_rate + output tokens × model_output_rate; cached tokens at 10% of normal rate
  • FinnestAgents.AiProvider.MockProvider adapter:
  • Deterministic responses keyed by input hash
  • Cost = 0; tokens synthesised
  • Used in all tests not explicitly integration-gated
  • FinnestAgents.PromptBuilder — helper module: build/3 assembles system prompt parts + org context + history + user message in the correct cache-breakpoint order
  • FinnestAgents.PromptCache GenServer — aggregates cache_creation_input_tokens + cache_read_input_tokens per request; exposes hit-rate metric (target ≥70%)
  • Replace ClaudeClient placeholder in Agents Supervisor (F-012) with real AnthropicDirect adapter
  • config/runtime.exs: config :finnest, :ai_provider, FinnestAgents.AiProvider.AnthropicDirect (commercial); MockProvider (test)
  • Architecture test ai_provider_port_test.exs: both adapters implement the behaviour contract
  • Architecture test prompt_cache_structure_test.exs: PromptBuilder.build/3 output has cache_control on system parts, absent on user messages (regression protection for AI-09)

Out of scope:

  • BedrockSydney adapter (Phase 3 IRAP)
  • VertexAU adapter (Scout+Verify sprints)
  • LocalLLMProvider (Phase 3+)
  • Failover chain logic (add when BedrockSydney + VertexAU exist)
  • Full cost dashboard (F-020 or later)
  • Response caching in ETS (Scout+Verify sprints)

Technical Notes

  • Module namespace: FinnestAgents.* (flat), not Finnest.Agents.* (dotted) — see F-012 Technical Notes for the Boundary rationale. All AiProvider, AiProvider.AnthropicDirect, AiProvider.MockProvider, PromptBuilder, PromptCache, Response modules nest under FinnestAgents.
  • Default model: claude-sonnet-4-6 as the standing default for Tier C traffic (classification + routine generation — good speed/cost). Per-task override to claude-opus-4-7 (or later) for high-stakes reasoning: ambiguous intent classification, edge-case composition, compliance-sensitive generation. Caller selects via opts[:model] in generate/3 and stream/3. Haiku intentionally not wired in F-013 — revisit if Tier B fallback needs an even-cheaper router tier later.
  • Finch config: pool size 5 per AnthropicDirect; timeout 30s (long for streaming); keep-alive 60s
  • Claude message format: see Anthropic API docs for cache_control beta header
  • Required header for caching: anthropic-beta: prompt-caching-2024-07-31 (check current spec at story pickup — may have changed)
  • Token counting for cost: Anthropic returns usage.input_tokens, usage.output_tokens, usage.cache_creation_input_tokens, usage.cache_read_input_tokens — sum into cost_aud via rate table (store in config)
  • Streaming via Server-Sent Events; Finch supports streaming bodies; parse incrementally
  • PromptBuilder must produce identical system prompt bytes across calls for cache hits (same ordering, same tools, same context). Stable ordering is the crucial detail.
  • MockProvider should have a hook for tests to inject specific responses: MockProvider.expect(input_pattern, response)
  • Never log prompt content plaintext (PII risk); log prompt_hash only (Commandment #24)
  • AI-03/AI-04 tenant injection: context.org_id comes from session — never from user input. PromptBuilder asserts org_id presence.

Dependencies

  • Blocked by: STORY-F-012 (Agents Supervisor + BudgetGuard), STORY-F-005 (ANTHROPIC_API_KEY in Bitwarden)

Acceptance Criteria

  • FinnestAgents.AiProvider behaviour defined with 3 callbacks; documented with typespecs
  • AnthropicDirect adapter makes a real round-trip call to Anthropic's /v1/messages and returns parsed response
  • Prompt cache integration: system prompt + tool schemas appear in Anthropic response's cache_creation_input_tokens on first call; cache_read_input_tokens on subsequent calls within TTL
  • Cache hit rate >50% by the 3rd call in a session (given stable system+tools+org context)
  • MockProvider deterministic; MockProvider.expect/2 + reset/0 work for test setup/teardown
  • PromptBuilder.build/3 places static parts FIRST with cache_control, dynamic parts LAST without — asserted by unit test
  • PromptCache GenServer reports hit-rate metric; exposed via get_stats/0
  • generate/3 returns %Response{} with populated tokens, cache_stats, cost_aud
  • stream/3 returns working Enumerable yielding {:token, _} events
  • BudgetGuard (F-012) integration: AnthropicDirect calls BudgetGuard.check(org_id) before API call; rejects with :budget_exceeded if over limit
  • BudgetGuard.record_spend called after each successful call
  • Architecture tests pass (adapter contract + prompt structure)
  • Prompt content never appears in logs — only hash
  • mix format, credo --strict, dialyzer, mix boundary all green

Testing Requirements

  • Unit: PromptBuilder structure; MockProvider determinism; PromptCache aggregation
  • Integration (gated by ANTHROPIC_LIVE_TEST=true): real call to Anthropic returns expected shape; cache behaviour observable
  • Property: random inputs → MockProvider always returns {:ok, _} or {:error, _}; no raises
  • Budget test: set org budget to 0 → AnthropicDirect refuses call with budget error; no network call made

References