by The Digital Organizer

Systematic Programming

A research-backed system for building production software through AI agents. Less tooling, not more. Depth over breadth. Architecture over speed.

The Paradox

The Tools That Feel Helpful Are Hurting You

Here is the most important thing this research uncovered: the tools that feel most productive are often the ones that cause the most damage.

A randomized controlled trial by METR found that experienced developers using AI coding tools were 19% slower than developers working without them — despite believing they were 24% faster. Read that again. The gap between perception and reality was 43 percentage points.

This isn't a story about AI being bad. It's a story about AI being powerful enough to create the illusion of productivity while quietly eroding the architecture of your software. Without a system, every AI session is a fresh start — and fresh starts produce inconsistent, tangled, unmaintainable code.

19%

Slower with AI

Experienced developers using AI tools completed tasks 19% slower in the METR randomized trial — while believing they were faster.

METR, 2025

8×

More duplicated code

AI-generated code showed an 8-fold increase in copy-pasted code, with refactoring dropping from 25% to under 10% of all changes.

GitClear, 211M lines

1–3

Tools, then it drops

Productivity peaks with 1–3 AI tools, then declines. BCG calls it "AI Brain Fry" — more tools means more mental overhead, not more output.

BCG, N=1,488

The "Army of Juniors" Problem

The security research firm Apiiro analyzed AI-generated code and found 322% more privilege escalation paths and 153% more design flaws compared to human-written code. The Ox Security report gave it a name: the "Army of Juniors" problem. AI-generated code is highly functional but systematically lacks architectural judgment.

Every time you start a new AI session without context, you're hiring a brilliant but amnesiac junior developer. They'll write working code. They'll write it fast. And they'll make decisions that contradict what the last session built — because they have no idea what the last session built.

Without a System

Each AI session starts from zero → inconsistent patterns → code duplication → architectural drift → bugs compound → more time fixing than building

With a System

Every session inherits context → consistent architecture → automated quality gates → bugs caught at commit → each session builds on the last

The Foundation

Four Thinkers Who Saw This Coming

The problems AI creates for software aren't new. Four thinkers — writing decades before ChatGPT existed — described exactly the failure modes we're seeing now. Their frameworks explain why AI-assisted development goes wrong and, more importantly, how to fix it.

John Ousterhout

Strategic vs. Tactical Programming

In April 2025, Ousterhout directly addressed AI coding tools, calling them "akin to tactical tornadoes — they code fast, fix issues fast, while creating new issues and adding tech debt." His framework, from A Philosophy of Software Design, draws a sharp line between two modes of building software.

Tactical programming optimizes for the immediate goal: make it work, make it work now. This is exactly how AI operates — it optimizes for your current prompt, not your long-term system. Strategic programming optimizes for the design first: the primary goal is great architecture that also works.

His key insight for AI-assisted work: software design is a decomposition problem — breaking large systems into pieces that can be built independently. AI is excellent at building the pieces. It cannot decide how to break the system apart. That's your job.

Rich Hickey

Simple vs. Easy

Hickey makes a distinction that explains nearly every AI failure mode. "Simple" means one fold — it's objective, about lack of entanglement. "Easy" means near, familiar — it's subjective, about what's close to what you already know. AI makes everything easy. It does not make things simple.

An AI agent will happily generate code that works but has deeply interleaved concerns, hidden coupling, and tangled state. The "complecting" (Hickey's term for tangling things together) is invisible because the code passes tests and appears functional. But every time you ask AI to modify one part, it breaks three others — because the parts were never truly separate.

His "Hammock Driven Development" approach: state the problem clearly, research broadly, identify trade-offs, then sleep on it. Don't rush from idea to AI prompt. Write down the problem. Let the design marinate. Your background mind makes connections that a prompt can't capture.

Sandi Metz

What Does Your Code Know About?

Metz asks the question AI never considers: "What does this piece of code know about?" Every function, every file, every module has a set of things it "knows" — other parts of the system it depends on. The more it knows about, the more fragile it becomes. Change one thing, and everything that "knew about" it breaks.

Her four rules are mechanical enough to be a checklist: Classes under 100 lines. Methods under 5 lines. No more than 4 parameters. No more than 4 instance variables. Break them only if you've tried not to. These rules catch exactly the kind of sprawl AI produces — oversized classes, methods doing too many things, deeply coupled parameter lists.

For AI-assisted work, Metz's lens is the simplest code review tool you have: when the AI writes a function, ask "what does this function know about?" If the answer is "everything," it needs to be broken apart.

Donella Meadows

Leverage Points — Where to Intervene

Meadows ranked 12 places to intervene in any system, from weak to powerful. Most attempts to improve AI-assisted development operate at the weakest leverage point: tweaking prompts and adjusting parameters. That's like adjusting the thermostat when the building has no insulation.

More powerful interventions sit higher in her hierarchy. Information flows (Level 6): give every AI session access to architecture docs and design decisions — this is what CLAUDE.md files accomplish. Rules (Level 5): coding standards, linters, and automated tests that constrain AI output. Self-organization (Level 4): modular architectures that AI can extend without breaking. Goals (Level 3): shifting from "generate working code" to "generate maintainable, simple code."

The deepest shift is at Level 2 — the paradigm itself. Moving from "AI writes code for me" to "I am the architect, AI implements my designs." Don't tweak prompts. Invest in architecture documentation, automated testing, and the mindset that you are the designer, not a prompt-writer.

The Framework

The Three-Layer System

The research converges on a system with three layers, ordered by how much they actually enforce versus merely aspire. The bottom layer is weightless after setup. The middle layer takes a few minutes per session. The top layer is where the real thinking happens — and it's the only part that should require cognitive effort.

Layer 3

Strategic Design

The human's irreplaceable job

Layer 2

Architectural Infrastructure

Update as the project evolves

Layer 1

Mechanical Enforcement

Set up once, never think about it

Mechanical Enforcement

Set up once, never think about it

These are forcing functions — they physically prevent violations regardless of what the AI or the human wants in the moment. Install them once with AI's help, and they become permanent quality gates on all future work.

Husky + lint-staged — runs code quality checks on every commit. Commit literally fails if checks don't pass.
TypeScript strict mode — code that violates type rules won't compile. Zero maintenance.
Claude Code hooks — shell scripts that block dangerous commands before they execute.
GitHub Actions CI — tests run on every push. Failed tests block merges.

Compliance: ~95–100%. These can't be talked out of, forgotten, or deprioritized. That's the point.

Architectural Infrastructure

Update as the project evolves

These are the context documents that give every fresh AI session the knowledge it needs without reading the entire codebase. A few minutes per session to maintain.

CLAUDE.md — hot memory. Under 200 lines of conventions, commands, and structure. Loaded every session.
AGENTS.md files — per-directory intent. What this folder does, what contracts it follows, what breaks if you change it.
docs/ folder — cold memory. One spec per major feature, retrieved on demand.
Architecture Decision Records — simple markdown files: "we chose X because Y." Future sessions can understand why.
Progress log — updated at the end of each session: "state of work" and "next tiny action."

Strategic Design

The human's irreplaceable job

This is the only layer that should require real cognitive effort. It's also the only layer AI cannot do for you.

Decompose problems before prompting — decide how the system should be broken apart (Ousterhout).
Demand simplicity, not just functionality — ask "is this simple or just easy?" (Hickey).
Review AI output against Metz's rules — classes under 100 lines, methods under 5 lines.
Use phase gates for multi-step features — don't let AI build everything in one shot.
Spend innovation tokens deliberately — one tool, mastered deeply.

Why this layering matters: Kathy Sierra's Collapse Zone

Sierra's research shows that under cognitive load, only fully automated or unconscious behaviors survive. Anything that requires you to "remember to check" will fail on your worst day. Layer 1 is automated. Layer 2 is near-automatic. Layer 3 is where you invest your limited cognitive energy — on the work that actually matters.

The Tools

What They Are and Why They're Here

Each tool below has a specific job. We explain what it is (in plain language), why it matters for AI-assisted development, and how it works. If you've never heard of npm or TypeScript, you'll still understand why each tool earns its place.

1 Husky Git hook manager — the bouncer at the door

What it is: Husky is a "git hook manager." Git (the version control system that tracks every change to your code) supports "hooks" — scripts that run automatically at specific moments, like right before you save a change. Husky makes it easy to set up and share these hooks across a project.

Why it matters: Think of Husky as a bouncer at the door of your codebase. Before any change can be saved (committed), Husky runs your quality checks. If the checks fail, the commit is rejected. It doesn't matter if you're tired, if the AI forgot a rule, or if you're rushing before a deadline. The bouncer doesn't care about your reasons — either the code passes or it doesn't get in.

How it works: You install it once (by running npm install -D husky in your project). Then you tell it which checks to run before each commit. With 10 million weekly downloads, it's the industry standard. Ask Claude Code to set it up and you'll never think about it again.

2 lint-staged Fast quality checks on just what changed

What it is: A tool that runs code quality checks (called "linters") only on the files you're about to commit — not the entire project. It's the partner to Husky: Husky triggers the check, lint-staged makes it fast by narrowing the scope.

Why it matters: Without lint-staged, every commit would scan your entire codebase, which could take minutes on a large project. With it, checks run in seconds because they only look at what changed. This means you'll actually keep the checks turned on instead of disabling them because they're too slow.

How it works: Paired with Husky's pre-commit hook. When you commit code, Husky fires, lint-staged identifies which files are changing, and runs the configured linters on just those files. If any fail, the commit is blocked. Fast feedback, narrow scope.

3 ESLint + Prettier Code style enforcement — consistency across sessions

What they are: ESLint catches code mistakes (unused variables, unreachable code, potential bugs). Prettier enforces code formatting (indentation, line breaks, quote styles). Together, they ensure every piece of code follows the same conventions, regardless of which AI session wrote it.

Why they matter: AI sessions don't have preferences — they'll write code in whatever style seems appropriate in the moment. Session 1 might use single quotes; Session 47 might use double quotes. ESLint + Prettier eliminate this drift. Every file looks the same, every variable follows the same naming pattern, every function is formatted identically.

How they work: Configuration files (.eslintrc and .prettierrc) define your rules. When triggered by Husky + lint-staged, Prettier auto-formats your code and ESLint flags any violations. Most violations are auto-fixed — no human intervention needed.

4 TypeScript Strict Mode The compiler catches what AI misses

What it is: TypeScript is a version of JavaScript that adds "types" — labels that describe what kind of data a variable holds (a number, a string, a user object). "Strict mode" is a compiler setting that makes TypeScript much more rigorous about checking these types. Code that violates the type rules literally won't compile.

Why it matters: Imagine you have a function that expects a user's email address. Without strict mode, you could accidentally pass it a number, and nothing would catch the error until a real user hit the bug. With strict mode, the compiler catches it instantly — before the code ever runs. For AI-generated code, this is critical because AI frequently makes type errors that look correct but break at runtime.

How it works: One setting in your tsconfig.json file: "strict": true. That's it. Zero ongoing maintenance. The compiler becomes your safety net, catching entire categories of bugs that AI would otherwise introduce silently.

5 madge Dependency map — "if you change X, Y and Z are affected"

What it is: madge scans your code's import statements and builds a complete map of which files depend on which other files. Think of it as an X-ray of your project — it shows the hidden connections between components that you can't see by reading individual files.

Why it matters: When AI modifies a file, it can't see what else depends on that file. madge can. It answers the question every developer needs: "If I change this file, what else might break?" It also detects circular dependencies (A depends on B, B depends on A) and orphan files (code that nothing uses anymore).

How it works: One command: npx madge --json src/ --ts-config tsconfig.json. Wire it into an automated hook so it regenerates silently after every coding session. Auto-generated, auto-maintained — no documentation burden.

6 Claude Code Hooks The only way to mechanically enforce rules Claude might ignore

What they are: Shell scripts that run automatically when Claude Code does things — before it executes a command, after it edits a file, when it finishes responding. They're configured in a settings file and fire every single time, with no way for Claude to skip them.

Why they matter: Writing rules in CLAUDE.md is aspirational — research shows even the best AI follows fewer than 30% of instructions perfectly under pressure. Hooks are mechanical. A PreToolUse hook that blocks dangerous commands fires 100% of the time. The Tsinghua AGENTIF benchmark confirmed: AI compliance drops as instructions get longer. Hooks don't have that problem.

The three hook types:

PreToolUse — runs before Claude executes an action. Return exit code 2 to block it entirely.
PostToolUse — runs after Claude edits a file. Auto-run a linter on every edit.
Stop — runs when Claude finishes responding. Inject guidance for the next turn.

Think of them as invisible guardrails. Claude can't bypass them, even with --dangerously-skip-permissions.

7 claude-guardrails Pre-built safety hooks — 15 deny rules in one command

What it is: An open-source package (dwarvesf/claude-guardrails) that installs 3 hooks with 15 deny rules out of the box. It blocks things like deleting your home directory, hardcoding API keys, editing .env files, and force-pushing to protected branches.

Why it matters: You shouldn't have to think about every possible destructive action AI might take. claude-guardrails is the "pit of success" pattern in action — a term from Microsoft Research meaning "make the correct behavior the easiest behavior." Install it and you fall into safe practices by default.

How it works: One command: npx claude-guardrails. It configures your Claude Code settings with PreToolUse hooks that regex-match dangerous bash commands and block them before execution. Setup time: about 30 seconds.

8 Architecture Decision Records The "why" behind every non-obvious choice

What they are: Simple markdown files that capture a decision, the context behind it, the alternatives you considered, and the consequences. Format: Title, Status (accepted/rejected/superseded), Context, Decision, Consequences. Recommended by AWS, Microsoft Azure, and Google Cloud as the standard for decision tracking.

Why they matter: Code tells you what was built. Comments sometimes tell you how. Neither tells you why. Without ADRs, future AI sessions (and future you) will look at a design choice, not understand the reasoning, and change it — reintroducing the exact problem the original decision solved.

How they work: A folder of numbered markdown files: docs/decisions/001-use-firebase-over-supabase.md. Write one whenever you choose between two reasonable approaches, delete functionality, change defaults, or make any decision a future reader might question. Keep them in the same commit as the code they describe — decisions captured in the commit flow stick; separate documentation tasks don't.

9 CLAUDE.md + AGENTS.md The three-tier knowledge system for AI context

What they are: A tiered system that gives AI agents context about your project. CLAUDE.md sits at the project root — it's the "hot memory" loaded at the start of every session. It contains conventions, key commands, project structure, and rules. AGENTS.md files sit in subdirectories — they describe what that folder does, what contracts it follows, and what breaks if you change it.

Why they matter: A companion study found that AGENTS.md presence is associated with a 29% reduction in AI runtime and 17% reduction in token usage. The Codified Context paper (arXiv 2602.20478) validated this exact pattern across 283 development sessions building a 108,000-line system. Documentation isn't optional overhead — it's load-bearing infrastructure that AI agents depend on.

How they work: CLAUDE.md under 200 lines. One AGENTS.md per major directory. Update them as the project evolves — when you add a new convention, change a contract, or discover a gotcha. Think of CLAUDE.md as the briefing a new team member gets on day one, and AGENTS.md files as the tribal knowledge each team has about their area.

10 GitHub Actions CI The ultimate forcing function — broken code can't merge

What it is: GitHub Actions is a built-in automation system that runs tasks whenever code is pushed. "CI" stands for Continuous Integration — the practice of automatically testing every change before it enters the main codebase. You define the steps in a YAML file, and GitHub runs them on their servers.

Why it matters: CI is the ultimate forcing function because it operates outside your local environment. Husky can theoretically be bypassed on your own machine. CI cannot. It runs on GitHub's servers, and you can configure it so that code with failing tests physically cannot be merged into your main branch. Research consistently shows: CI works because you can't bypass it. Checklists fail because they require willpower.

How it works: A file in .github/workflows/ defines your pipeline: install dependencies, run linter, run tests, check types. Every push triggers it. A green checkmark means it passed; a red X means something broke. Configure branch protection rules to require passing CI before merges, and you have an automated quality gate that no one — human or AI — can circumvent.

The Enforcement Pyramid

Not all enforcement is equal. The Tsinghua AGENTIF benchmark found that even the best AI follows fewer than 30% of instructions perfectly. Here's how each mechanism actually performs:

Layer	Mechanism	Compliance	Bypassable?
Forcing function	TypeScript strict mode, CI gates	~100%	Code won't compile / merge
Forcing function	PreToolUse hooks (block commands)	~100%	Cannot be bypassed
Mechanical	PostToolUse hooks (auto-run linter)	~95%	Only by disabling hooks entirely
Semi-mechanical	Skills with embedded shell commands	~80%	AI may not invoke them
Aspirational	CLAUDE.md rules (prose)	~60%	Context pressure, conflicting signals

The Process

How Phases Chain Together

The biggest source of chaos in AI-assisted development isn't bad code — it's bad handoffs. Session 1 makes decisions. Session 2 doesn't know about them. Session 3 contradicts both. The workflow solves this by turning each phase into a file that the next phase reads.

Native deterministic chaining doesn't exist in Claude Code — Anthropic confirmed this is by design. But three patterns create reliable phase transitions without it.

Slash Commands as Phase Gates

Simplest — the human decides when to advance

Create separate slash commands for each phase: /phase1-data-model, /phase2-api, /phase3-ui. Each command defines its scope, expected inputs, and deliverables. A Stop hook reads a state file and suggests what to run next: "Phase 1 complete. Review the data model, then run /phase2-api."

The human-in-the-loop gate is the act of typing the next command. You review the output, decide it's ready, and trigger the next phase. This maps directly to GitHub Actions' environment approval pattern — workflows pause between stages until a human approves.

The Planning Depth Ladder

Match process weight to task size

Not everything needs the full pipeline. The key principle from Forever Alone Programming: "Process must be lighter than the work it governs."

Task Size	Process	Example
Quick fix	Fix it → commit → ship	Typo, dependency bump, small bug with obvious cause
Small feature	Brief plan → implement → ship	Add a button, modify a form, update a query
Big feature	Think → Plan → Work → Learn → Ship	New page, new API endpoint, multi-file refactor
Architectural change	Deep research → ADR → Full pipeline	Database migration, auth system swap, new infrastructure

Files, Not Context

Phase handoffs survive session boundaries

Each phase writes its output to a file on disk, then recommends the next step. When you clear context and start a new session, the next agent reads the file. Nothing lives only in conversation memory.

Phase	Output Artifact	Next Step
Think	Design doc in `docs/plans/`	Clear context → Plan
Plan	Plan file with execution strategy	Clear context → Execute
Work	Code committed to branch	Clear context → Learn or Ship
Learn	Solution doc in `docs/solutions/`	Clear context → Ship

The McKinsey/QuantumBlack team found that agents routinely skip steps, create circular dependencies, or get stuck in analysis loops when allowed to self-orchestrate across phases. Their recommendation: use a "conventional, rule-based workflow engine" for phase transitions, with agents handling execution within each bounded phase.

Tool Evaluation

The FOMO Firewall

Dan McKinley's "Choose Boring Technology" argues that every developer gets about three innovation tokens. Each unfamiliar technology spends one. For a solo developer, you arguably get 1–2 — there's no team to absorb the learning curve.

The FOMO Firewall is a 5-minute decision protocol synthesized from the ThoughtWorks Technology Radar, Bezos's one-way/two-way door framework, McKinley's innovation tokens, and the IETF's "rough consensus and running code." Use it every time you're tempted by a new tool.

The "Running Code" Test (30 seconds)

Have I seen this tool solve a real problem I actually have right now? Can I name the specific task it will improve this week? If both are "no," bookmark it and revisit in 30 days.

The Bezos Door Test (30 seconds)

Is this a one-way door (deep integration, data migration, workflow restructuring) or a two-way door (easy to try, easy to remove)? Two-way doors get a timeboxed 2-hour trial. One-way doors require full evaluation.

The Innovation Token Test (1 minute)

Does this spend one of my ~2 tokens? Does it overlap with something I already use? What's the cognitive overhead? If you're already carrying 2+ unfamiliar tools, stop.

The Radar Placement (1 minute)

Place the tool explicitly on the Assess → Trial → Adopt → Hold spectrum. Write it down. A tool cannot jump from Assess to Adopt — it must prove itself in Trial first with real production usage.

The Exit Plan (1 minute)

Can I remove it in under an hour? Does it store data in a proprietary format? What happens if this tool disappears tomorrow? If you can't exit cleanly, you can't enter safely.

Red flags that signal a FOMO spiral

Reading about tools instead of building with them
Three or more tools in "Trial" simultaneously
Comparing tools you haven't actually used
The appeal is "everyone uses it" rather than "this solves problem X"

The IETF's principle: "rough consensus and running code." Don't adopt until you have running code — proven, working value in your actual workflow. Watching a demo doesn't count.

Accessibility

Process That Survives Your Worst Day

This section isn't optional. It's core to the methodology. A system designed to survive cognitive depletion works for everyone. On your best day, these practices make you faster. On your worst day, they keep you from losing work. That's not an accessibility feature — it's the whole point.

These strategies come from developers with ADHD who reported what actually works in practice — not generic productivity advice, but specific patterns validated against the realities of executive function challenges.

TDD as External Memory

Multiple ADHD developers called test-driven development "a career-saver." Tests serve as external working memory — they hold the definition of "done" so executive function doesn't have to. Write the test first, then make it pass. One small failing test at a time prevents the overwhelm of holding the whole picture in your head.

Micro-Handoff Shutdown Ritual

End every work block with two bullet points: "state of work" and "next tiny action." This prevents the blank-page restart that paralyzes ADHD brains. When you come back — whether in 10 minutes or 10 days — you know exactly what to do next. No decision fatigue, no "where was I?"

The 50/10 Cadence

Work 50 minutes, break 10. At every break point, write the single next step before stepping away. This exploits ADHD's relationship with novel starts — making "what do I do next?" trivially easy eliminates the restart paralysis. The break isn't optional. Neither is the next-action note.

Automation as Accessibility

"Automate tasks you have to do frequently or on a schedule, especially if you'll forget them on your worst day." CI/CD pipelines, linters, formatters, dependency updates, deployment scripts. For someone with ADHD, "worst day" isn't hypothetical — it's any day where executive function is depleted. Automation isn't a nice-to-have. It's a requirement.

Git Hygiene as Impulse Control

Branch protections prevent impulsive pushes to main. git stash for messy experiments. Always work in feature branches that can be destroyed. These are mechanical guardrails against the ADHD tendency to ship half-finished work during a hyperfocus burst. The tooling doesn't judge or require willpower — it just makes the safe path the default path.

The Proof

The Codified Context Pattern

If you want proof that a non-engineer can build and maintain production software through AI agents, this is it.

Aristidis Vasilopoulos — a chemistry expert, not a software engineer — built a 108,000-line C# distributed system across 283 development sessions using Claude Code as the sole code-generation tool. His paper (arXiv 2602.20478, February 2026) documents exactly how he did it, and his core thesis changes how you think about documentation: documentation is infrastructure — load-bearing artifacts that AI agents depend on.

Tier 1: Hot Memory

A project constitution loaded every session. Conventions, retrieval hooks, orchestration protocols. Equivalent to CLAUDE.md — always in context.

Tier 2: Warm Memory

19 specialized domain-expert agents, each with embedded project-specific knowledge. The save system agent "knows" saves; the UI agent "knows" routing.

Tier 3: Cold Memory

34 on-demand specification documents in a docs/ knowledge base. Retrieved via keyword triggers or explicit requests. Detailed but only loaded when needed.

The pattern works because it matches how human organizations manage knowledge: everyone gets the company handbook (hot), each team has its own expertise (warm), and detailed specs live in a shared drive for when you need them (cold). AI agents operate under the same constraints — limited context windows, no persistent memory, fresh starts every session — so they benefit from the same tiered approach.

A companion study found that this approach yields a 29% reduction in AI runtime and 17% reduction in token usage. Your context documents aren't overhead — they're performance optimization. Every line of CLAUDE.md saves tokens downstream by preventing the AI from exploring dead ends, asking clarifying questions, or making wrong assumptions.

The Failures

What NOT to Do

Success stories get the headlines. Failure patterns teach the lessons. These are documented cases where AI-assisted development went wrong — and the specific mistakes that caused it.

The Roadtrip Ninja Case Study

100K+ lines — productivity cratered

A detailed case study of "Roadtrip Ninja" — a project built entirely with Claude Code — found that AI productivity gains "cratered" as the codebase grew. The developer spent more time managing Claude than building features.

At 70,000 lines, Claude would "randomly decide to implement authentication differently, switch database patterns mid-feature" despite CLAUDE.md being referenced every prompt. The root cause: no mechanical enforcement. Rules were aspirational, compliance was voluntary, and as context pressure grew, the AI increasingly ignored its own instructions.

The GitClear Data

211 million lines analyzed

GitClear analyzed 211 million lines of code across the 2020–2024 period and found an 8-fold increase in code duplication coinciding with AI tool adoption. Copy-pasted code exceeded moved code for the first time in 20 years. Refactoring dropped from 25% to under 10% of all changed lines.

The implication: AI doesn't refactor. It copies. Every time you ask AI to "add a feature like the one over there," it copies the code instead of extracting a shared component. Without the strategic layer (you deciding what to share and what to separate), duplication compounds session by session.

Why More Tools Makes You Slower

BCG "AI Brain Fry" study

BCG's study of 1,488 workers found that high-oversight AI use produces 14% more mental effort, 12% greater mental fatigue, and 19% greater information overload compared to non-AI tasks. Productivity peaks at 1–3 tools, then declines.

Aaron Brethorst's 2025 update to "Choose Boring Technology" nailed it: "In the AI era, there's an additional risk: the false confidence that comes from having an AI tool that can generate seemingly professional code for any technology stack." Just because AI can use a tool doesn't mean you should add it to your stack.

Hand-Maintained Registries Always Rot

Registry maintenance research

Research into documentation systems found that manually maintained registries universally fail. Systems evolve faster than anyone can document manually. The only registries that survive long-term are either auto-generated from code (like madge's dependency graph) or updated in the same commit as the code they describe (like ADRs).

If your system requires someone to "remember to update the documentation," it will decay. The ThoughtWorks study of successful organizations confirmed: only processes embedded in the commit flow persist. Everything else is a wish.

The Most Counterintuitive Finding

The path to professional-quality software through AI agents requires less tooling, not more. BCG's data on productivity decline, McKinley's innovation tokens, Sierra's collapse zone, and Ousterhout's "tactical tornado" critique all converge on the same point: depth with one well-constrained tool outperforms breadth across many. The Codified Context paper proves a non-engineer can build and maintain a 108,000-line production system — but only by treating documentation as infrastructure and investing in the architectural decomposition that AI cannot do for itself.