The 42 Commandments of Lean AI-Assisted Development¶
Status: Active Date: 2026-04-11 Purpose: Philosophical foundation for all development decisions. Every rule in CLAUDE.md and every guardrail in 10-GUARDRAILS.md flows from these principles. Audience: Developers (human and AI), architects, reviewers.
These commandments exist because AI-assisted development removes the natural friction that historically prevented bloat, over-engineering, and unmaintainable code. When generating code is nearly free, discipline becomes the only constraint. Without these principles, AI-assisted projects inevitably reach a point where they must be scrapped and rewritten.
Every commandment prevents a specific failure mode. None are aspirational — each addresses a real pattern that kills software projects.
I. Code — How You Write It¶
1. Code is a liability, not an asset¶
Every line you ship is a line you maintain, test, debug, and eventually migrate. AI makes writing code nearly free, which makes this principle more important, not less — the friction that used to prevent bloat is gone.
Failure mode it prevents: Codebase grows faster than the team's ability to understand it. Maintenance cost exceeds development cost.
Test: Can you delete this file and nothing breaks? Then delete it.
2. No speculative abstractions — wait for proof¶
Don't create interfaces, base classes, contracts, or design patterns until 2+ concrete implementations exist. A function is simpler than a class. A class is simpler than an interface with one implementor. AI loves to generate abstraction hierarchies for things that have exactly one concrete use.
Failure mode it prevents: Architecture astronautics — layers of indirection that serve no current purpose and make the code harder to follow.
Test: Does this interface/base class have more than one implementation? No? Inline it.
3. You must understand every line you ship¶
This is active, not passive. It's not enough that the code could be understood — someone must actually understand it before it merges. Research shows developers who delegate code generation to AI score 40% on comprehension tests vs 65% for those who use AI for conceptual inquiry. "Tests pass" is not the same as "I understand what this does."
Failure mode it prevents: Comprehension debt — the growing gap between how much code exists and how much any human genuinely understands.
Test: Can the person who approved this PR explain what it does and why, without looking at the code?
4. Solve the problem you have, not the problem you might have¶
YAGNI — You Aren't Gonna Need It. AI will happily build configurable, extensible, plugin-based architectures for things that need to work exactly one way. Build the simplest thing that solves today's problem. If tomorrow's problem is different, refactor then — with the benefit of actually understanding the requirement.
Failure mode it prevents: Speculative complexity that makes the codebase harder to change when the real requirement arrives (because it inevitably differs from what was anticipated).
Test: Is anyone asking for this flexibility right now? No? Hardcode it.
5. Consistency over cleverness¶
The 50th file should look exactly like the 1st file. AI generates novel solutions each session because it optimises for the current prompt, not for project coherence. A codebase needs boring, repetitive patterns that anyone can follow without studying each file individually.
Failure mode it prevents: Every file becomes a unique snowflake. New developers (human or AI) must study each file individually instead of recognising a familiar pattern.
Test: Does this new file follow the same structure as the gold-standard files? If not, why not?
6. Delete before you add — but understand before you delete¶
Before writing new code, ask: can I solve this by removing something? The best refactoring makes the codebase smaller. But before deleting existing code, understand why it exists (Chesterton's Fence). AI confidently refactors code without understanding the historical reason something was built that way.
Failure mode it prevents: Monotonic growth (never deleting) on one side; breaking hidden invariants by removing code you didn't understand on the other.
Test: Is the codebase smaller after this sprint than before? If it grew, did it need to?
7. Three is a pattern, one is just code¶
Don't extract a helper until you've written the same thing three times. Don't create a base class until three classes share the same structure. AI will extract abstractions after seeing something once because it pattern-matches aggressively.
Failure mode it prevents: Premature abstraction that obscures simple code behind unnecessary indirection, making future changes harder because they must satisfy the abstraction's contract.
Test: How many callers does this helper/trait/base class have? Fewer than 3? Inline it.
II. System — How It Fits Together¶
8. Configuration is complexity in disguise¶
Every config() call, every .env variable, every feature flag is a decision someone has to understand, document, and get right per environment. AI adds configuration because it's "flexible." But 50 env vars means 50 things that can be misconfigured in production.
Failure mode it prevents: Configuration sprawl where nobody knows which values are required, which are optional, and what happens when they're wrong.
Test: Could this be a constant instead of a config value? Then make it a constant.
9. Tests prove behaviour, not implementation¶
Write tests that verify what the system does, not how it does it. AI writes tests that mirror the implementation — checking that specific methods were called with specific arguments. When you refactor the internals, these tests break even though behaviour is unchanged, creating fear of refactoring.
Failure mode it prevents: Brittle test suites that discourage refactoring because every internal change breaks dozens of tests that weren't testing real behaviour.
Test: If you completely rewrite the internals of this service, do the tests still pass? They should.
10. Every public method, route, and config option is a promise¶
Minimise what you expose. Every public API, every route, every event is a contract that future code might depend on. Once something is public, removing it is a breaking change. AI creates public methods freely — but surface area is a cost, not a feature.
Failure mode it prevents: Accidental coupling where internal details become load-bearing because they were unnecessarily exposed.
Test: Could this method be private/protected? Could this config be internal? Then restrict it.
11. Ubiquitous language — one word per concept, everywhere¶
Code, docs, conversations, and UI should use the same term for the same thing. AI drifts naturally: "candidate" in one service, "applicant" in another, "person" in a third. Over 50 sprints this becomes a translation tax that compounds with every new developer.
Failure mode it prevents: Semantic confusion where the same concept has different names in different parts of the system, causing bugs when developers assume two different names mean different things (or the same thing).
Test: Does the project glossary exist? Does this PR use terms consistently with it?
12. Error handling needs one philosophy, applied consistently¶
A project needs one answer to: what's recoverable? What should crash? What degrades gracefully? AI generates inconsistent error handling because each session makes its own judgment call. Without a stated philosophy, the codebase becomes unpredictable under failure.
Failure mode it prevents: Some services silently swallow errors while others crash on transient failures. No one can predict what happens when something goes wrong.
Test: Can you describe the project's error handling philosophy in one sentence? Does this code follow it?
13. Every decision needs an exit path¶
Before adopting a tool, framework, or service, ask: how do we migrate away from it? If the answer is "rewrite everything," the coupling is too tight. This is broader than database migrations — it applies to architectural decisions, dependency choices, and provider selections.
Failure mode it prevents: Lock-in that turns a voluntary technology choice into an inescapable constraint when the technology becomes unmaintained, expensive, or unsuitable.
Test: How long would it take to swap this out? If the answer is "months," add an abstraction layer.
14. Dependencies age faster than your code¶
Your code might be clean, but every dependency will release breaking changes. PHP 8 becomes PHP 10. Laravel 13 becomes Laravel 17. If your code reaches into framework internals instead of using public APIs, every major upgrade becomes a rewrite risk. The thinner the coupling to the framework, the longer the project lives.
Failure mode it prevents: Upgrade paralysis — the project can't adopt security patches or new framework versions because too much code depends on internal framework behaviour.
Test: Does this code use a documented public API, or does it reach into framework internals?
15. Prefer boring technology¶
Choose well-understood, widely-adopted tools over newer, faster, or shinier alternatives. Boring technology is predictable, debuggable, and hireable. Every "interesting" technology choice is a bet that the ecosystem will still exist and be maintained in 5 years.
Failure mode it prevents: Adopting tools that have impressive benchmarks but small communities, poor documentation, or uncertain long-term maintenance — leaving the project stranded.
Test: Can you hire someone who already knows this technology? Is it in the top 10 for its category? Has it existed for 5+ years?
16. Data models outlive everything¶
Controllers get rewritten, services get refactored, views get redesigned. But database tables persist forever. A bad schema decision in Sprint 5 becomes load-bearing by Sprint 50 with millions of rows and dozens of dependent queries. Schema changes deserve 10x the scrutiny of code changes.
Failure mode it prevents: Schema decisions made with the same casualness as code decisions, creating permanent structural problems that can only be fixed with expensive, risky data migrations.
Test: Would you be comfortable with this table structure in 3 years with 10x the data?
III. Operations — How You Keep It Alive¶
17. Feature flags are technical debt with a timer¶
Every flag is a branch in your codebase where both paths must be maintained and tested. 13 flags means 13 possible states that can interact in unexpected ways. Flags must die within 1 sprint of full rollout. If a flag lives longer than 90 days, it's not a flag — it's a permanent fork you're pretending is temporary.
Failure mode it prevents: Flag accumulation where nobody knows which flags are active, which are stale, and which interact with each other — eventually creating untestable combinatorial state.
Test: How old is this flag? If older than 90 days, resolve it now.
18. If you can't see it running, you can't maintain it¶
Code quality is necessary but not sufficient. If production has no logging, no error tracking, no metrics, no alerting, then bugs are invisible until users report them. Observability is not optional infrastructure — it's a prerequisite for maintenance.
Failure mode it prevents: Silent failures, performance degradation, and security incidents that go undetected because nobody instrumented the system to report them.
Test: If this service fails at 3am, how do we find out? If the answer is "a user emails us," the observability is insufficient.
19. Documentation is a product, not a byproduct¶
AI generates docs easily, which makes them feel free. But stale docs are worse than no docs because they actively mislead. Every document must have an owner, a freshness mechanism, or an expiry date. If you can't keep it current, don't write it.
Failure mode it prevents: A repository full of documentation that was accurate when written but now describes a system that no longer exists, leading developers to make decisions based on outdated information.
Test: When was this document last verified against the code? If you don't know, it's stale.
IV. Process — How You Decide What to Build¶
20. The human decides what to build; the AI decides how¶
The most dangerous AI capability isn't writing bad code — it's writing good code for the wrong thing. Scope creep is invisible when implementation is instant. The human's job is saying "no, we don't need that" and "stop, that's enough." The AI's job is implementing the agreed scope efficiently.
Failure mode it prevents: Building features nobody asked for because the AI suggested them and implementation was "free."
Test: Did the user explicitly ask for this? If not, don't build it.
21. Build for your team, not your fantasy team¶
Abstractions that coordinate a 20-person team are overhead for a 2-person team. The level of process, indirection, and architectural ceremony should match the people who actually maintain the system. Over-engineering for a team size you don't have creates complexity without the coordination benefits.
Failure mode it prevents: Enterprise-grade architecture in a startup-stage product, where the overhead of the architecture exceeds the value of the features it enables.
Test: Does this abstraction help at our current team size, or only at 10x?
22. Ship the smallest change that delivers value¶
Not the complete feature — the thinnest vertical slice that a user can benefit from. Horizontal layers (build all the models, then all the services, then all the controllers) delay feedback and increase risk. Thin vertical slices (one endpoint, end to end) deliver value early and validate assumptions.
Failure mode it prevents: Large, all-or-nothing releases where three months of work ships at once and any single bug blocks the entire feature.
Test: Could we ship half of this and still deliver value? Then do that first.
V. Security & Data — How You Protect It¶
23. PII has a classification tier and handling rules¶
Not all data is equal. Authentication credentials, candidate personal data, operational data, and public content each require different handling. Without explicit classification, AI treats all data the same — logging PII, exposing sensitive fields in APIs, and storing credentials alongside operational data.
Failure mode it prevents: PII in logs, unencrypted sensitive data, and compliance violations that aren't caught until an audit or a breach.
Test: What classification tier is this data? What are the handling rules for that tier?
24. Logging must never contain PII — log by ID only¶
Log candidate_pool_id=42, never john.smith@example.com. This is non-negotiable regardless of log destination. If logs are ever shipped to an external aggregator, PII leaves the security boundary. AI naturally generates log messages that include the human-readable value because it's more descriptive.
Failure mode it prevents: PII scattered across log files, external aggregators, and error tracking services — creating a compliance liability that's nearly impossible to clean up retroactively.
Test: Grep the codebase for Log:: calls. Do any contain email, phone, or name variables?
25. Rate limit every public endpoint¶
Every endpoint accessible without authentication must be rate-limited. This includes forms, webhooks, API routes, and assessment links. AI generates functional routes without considering abuse — brute-force attacks, credential enumeration, and spam submissions.
Failure mode it prevents: Abuse of public endpoints for credential stuffing, spam, enumeration attacks, or resource exhaustion.
Test: Does this route have throttle: middleware? If it's public and doesn't, add it.
26. Secrets never in code — environment only¶
API keys, database passwords, encryption keys, and tokens live in environment variables, managed by a secrets manager. Never in source code, config files, or Docker images. AI occasionally generates placeholder secrets that look like real values, or hardcodes values "for now."
Failure mode it prevents: Credentials committed to git history (which is permanent even after deletion), leaked via Docker images, or exposed in config files.
Test: Run gitleaks. Does this PR contain anything that looks like a secret?
27. Security framework defaults are load-bearing — never disable them¶
CSRF protection, Blade escaping, parameterised queries, and session security exist because decades of security research proved they're necessary. AI sometimes disables these for convenience ("just use {!! !!} here" or "disable CSRF for this API route"). The defaults are not optional.
Failure mode it prevents: XSS, CSRF, SQL injection, and session hijacking vulnerabilities introduced by disabling protections that were on by default.
Test: Does this code disable or bypass a security default? If yes, reject it unless there's an explicit, documented, reviewed justification.
28. External data sources are read-only¶
Connections to databases, APIs, or services owned by other systems must be read-only at every level: connection credentials, model configuration, and code patterns. One accidental write to a production external database can corrupt another team's data irreversibly.
Failure mode it prevents: Accidental writes, migrations, or schema changes to databases owned by other systems.
Test: Do external models have write protection (ReadOnlyModel trait or equivalent)? Do connection credentials have write permissions? They shouldn't.
29. One request, one correlation ID — traceability across the entire stack¶
Every HTTP request generates a UUID that follows the request through controllers, services, queue jobs, and log entries. Without this, debugging production issues requires manually correlating timestamps across log files — which becomes impossible at scale.
Failure mode it prevents: Inability to trace a production error back to the request that caused it, especially when queue jobs are involved.
Test: Does the log output include a correlation ID? Can you trace a request from entry to completion?
VI. Architecture — How You Structure It¶
30. No vendor lock-in — access cloud services through framework abstractions¶
Use the Storage facade, not the S3 SDK directly. Use the Queue facade, not SQS directly. Use the Prism abstraction, not the Bedrock SDK directly. If swapping a provider requires changing more than configuration, the coupling is too tight.
Failure mode it prevents: Provider-specific code scattered throughout the codebase, making migration to a different provider a rewrite rather than a config change.
Test: If we switched from AWS to GCP tomorrow, which files would need to change? If more than config files, the abstraction is leaking.
31. Domain boundaries enforced by tooling, not convention¶
Convention-based boundaries ("don't import from other domains") erode under pressure. Tooling-based boundaries (Deptrac violations block merge) don't. If a boundary matters, automate its enforcement. Humans forget rules; CI doesn't.
Failure mode it prevents: Gradual erosion of domain isolation as deadlines pressure developers to take shortcuts that "just this once" cross a boundary.
Test: Is this architectural rule enforced by CI? If it's only in documentation, it will eventually be violated.
32. Gold-standard files are the pattern — match them¶
Every file type (controller, model, service, test, view) has a gold-standard example. New files must match the structure, naming, and patterns of the gold standard. Novel approaches require explicit justification. This is how consistency scales.
Failure mode it prevents: Pattern drift where each developer (or AI session) invents a slightly different approach, and the codebase becomes a museum of different coding styles.
Test: Put this new file side-by-side with the gold standard. Do they look structurally identical?
33. Constants over config — if it doesn't change per environment, hardcode it¶
Not everything needs to be configurable. If a value is the same in local, staging, and production, it's a constant. AI defaults to config() and env() because they're "flexible." But every config value is a deployment-time decision that someone can get wrong.
Failure mode it prevents: Configuration values that never actually vary across environments but still require documentation, validation, and mental overhead.
Test: Does this value differ between environments? No? Make it a class constant.
34. Separate what changes from what doesn't — isolate volatility¶
Put stable code and volatile code in different places. Business rules change often; database schema rarely. UI layouts change seasonally; authentication flows don't. When volatile and stable code are mixed, changing the volatile part risks breaking the stable part.
Failure mode it prevents: Routine UI changes breaking core business logic because they're in the same file, or authentication changes requiring testing of unrelated features.
Test: If the business rules change, how many files need to change? If the answer is "everything," the volatility isn't isolated.
35. Gall's Law — complex systems evolve from simple ones that worked first¶
A complex system designed from scratch never works. Start with the simplest version that solves the problem. Prove it works. Then extend it incrementally, verifying at each step. This is the antidote to big-bang architectures that look elegant on a whiteboard but fail in production.
Failure mode it prevents: Designing an elaborate system upfront that collapses under the weight of untested assumptions. Building version 5 before version 1 has proven the concept.
Test: Did the simple version work before we added this complexity?
36. Migrations must be reversible¶
Every database migration must implement a working down() method. This isn't just about rollbacks — it's about the discipline of thinking through how to undo a change before making it. If you can't undo it, you haven't fully understood what you're doing.
Failure mode it prevents: Irreversible schema changes that can't be rolled back when a deploy goes wrong, turning a bad deploy into a data incident.
Test: Does down() exist and actually reverse the up()? Run it and verify.
VII. Quality & Measurement — How You Know It's Working¶
37. Measure code health, not just output¶
Track metrics that reveal codebase health over time: lines of code per sprint (is the codebase growing faster than features?), dependency count (is it increasing without justification?), test count vs code count ratio (are tests keeping up?). If you only measure velocity, you'll optimise for speed at the expense of sustainability.
Failure mode it prevents: Invisible degradation where the codebase grows 20% per sprint but features grow 5%, and nobody notices until maintenance consumes all available time.
Test: Is the codebase larger this month than last? Did feature value grow proportionally?
38. Every queue job must be idempotent¶
A job must be safe to run more than once with the same input. Network failures, container restarts, and queue retries mean jobs will occasionally run multiple times. If a job creates duplicate records, sends duplicate emails, or charges twice on retry, it's not production-ready.
Failure mode it prevents: Duplicate emails, double charges, corrupted data, and other side effects from jobs that assume they'll only run once.
Test: Run this job twice with the same input. Is the result identical to running it once?
39. Run the full test suite, not just the file you changed¶
Tests that pass in isolation frequently fail in the full suite due to shared database state, Faker seed collisions, or test ordering dependencies. The --filter flag is for rapid development iteration. The full suite is the quality gate. If it passes in filter but fails in full, the test has an isolation bug.
Failure mode it prevents: CI failures that "work on my machine" because the developer only ran the tests they wrote, missing cross-test interactions.
Test: Did you run the full test suite before committing, or just --filter?
40. Churn is a signal — code discarded within 2 weeks means something is wrong¶
If code is written and then deleted or substantially rewritten within two weeks, that's not "fast iteration" — it's rework caused by insufficient understanding before implementation. AI-assisted development amplifies this pattern because writing is cheap and thinking feels expensive. But the thinking is the work.
Failure mode it prevents: Write-discard cycles that feel productive but deliver no net value, wasting review time, CI resources, and mental energy.
Test: How much code from last sprint survived to this sprint? If less than 80%, the planning was insufficient.
41. Every feature must be deletable¶
If removing a feature requires rewriting the system, the coupling is wrong. Features should be additive — turning one off should not break others. This is the ultimate test of good architecture: can you subtract without collapsing?
Failure mode it prevents: Features that become permanent not because they're valuable but because removing them is too expensive, leading to a codebase full of unused capabilities that still incur maintenance cost.
Test: What would break if we deleted this feature entirely? If the answer is "other features," the boundaries are wrong.
42. The answer is 42.¶
After 41 rules about discipline, rigour, and restraint — remember that nobody has all the answers. Software is built by humans for humans, with AI as a tool. The best engineering decisions come from humility: knowing what you don't know, questioning what you think you know, and being willing to change course when evidence says you're wrong.
The goal isn't perfection. It's building something that lasts.
Quick Reference¶
| # | Commandment | Category |
|---|---|---|
| 1 | Code is a liability | Code |
| 2 | No speculative abstractions | Code |
| 3 | Understand every line you ship | Code |
| 4 | Solve the problem you have (YAGNI) | Code |
| 5 | Consistency over cleverness | Code |
| 6 | Delete before you add; understand before you delete | Code |
| 7 | Three is a pattern, one is just code | Code |
| 8 | Configuration is complexity | System |
| 9 | Tests prove behaviour, not implementation | System |
| 10 | Surface area is a cost | System |
| 11 | Ubiquitous language | System |
| 12 | One error handling philosophy | System |
| 13 | Every decision needs an exit path | System |
| 14 | Dependencies age faster than your code | System |
| 15 | Prefer boring technology | System |
| 16 | Data models outlive everything | System |
| 17 | Feature flags are debt with a timer | Operations |
| 18 | Can't see it? Can't maintain it | Operations |
| 19 | Documentation is a product | Operations |
| 20 | Human decides what, AI decides how | Process |
| 21 | Build for your team size | Process |
| 22 | Ship the smallest valuable change | Process |
| 23 | PII has classification tiers | Security |
| 24 | Log by ID, never PII | Security |
| 25 | Rate limit every public endpoint | Security |
| 26 | Secrets never in code | Security |
| 27 | Never disable security defaults | Security |
| 28 | External data is read-only | Security |
| 29 | One correlation ID per request | Security |
| 30 | No vendor lock-in | Architecture |
| 31 | Boundaries enforced by tooling | Architecture |
| 32 | Match the gold-standard files | Architecture |
| 33 | Constants over config | Architecture |
| 34 | Isolate volatility | Architecture |
| 35 | Start simple, evolve (Gall's Law) | Architecture |
| 36 | Migrations must be reversible | Architecture |
| 37 | Measure code health | Quality |
| 38 | Idempotent queue jobs | Quality |
| 39 | Full test suite before commit | Quality |
| 40 | Churn is a signal | Quality |
| 41 | Every feature must be deletable | Quality |
| 42 | The answer is 42 | — |
Sources¶
These commandments were developed through structured brainstorming grounded in:
- Direct experience from 39 sprints of AI-assisted development on this project (463 points, 1,028 tests)
- Saidev Pre-Production Audit findings (110 findings across 6 categories)
- Comprehension Debt — Addy Osmani
- 2025 DORA State of AI-Assisted Software Development — Google
- AI-Generated Code Creates New Wave of Technical Debt — InfoQ
- The Codebase That Lasts Twice As Long — Jens Roland
- Hacker Laws — Gall's Law, Conway's Law, Chesterton's Fence
- Vibe Coding Security Crisis — Cloud Security Alliance
This document is the philosophical foundation. CLAUDE.md is the operational checklist. 10-GUARDRAILS.md is the enforcement reference. All three must stay aligned.