ResearchMarch 26, 2026

I let AI write my entire backend. Here's the code review.

1.7x

more issues in AI-generated code vs human-written

TL;DR

AI-generated backend code has 1.7x more issues and 2.74x more XSS vulnerabilities than human-written code. The core pattern is over-engineering: premature abstractions, phantom error handling, and config sprawl that passes every quality check.

I shipped a five-endpoint API in one afternoon. Authentication, CRUD operations, file uploads, webhooks, and a health check. The code worked. The tests passed. The linter was clean. Then I actually read it. What I found was not broken code — it was over-engineered code. Every endpoint had abstractions nobody asked for, error handling for scenarios that could not happen, and configuration for features that did not exist.

Why does AI over-abstract by default?

The pattern is consistent: ask for a simple CRUD endpoint and you get a full repository pattern with interfaces, factories, error types, and middleware. Ask for a login page and you get OAuth, session management, CSRF protection, rate limiting, and audit logging. The AI is not wrong — these are real concerns. But you did not ask for them, you do not need them yet, and now you maintain them forever. Every unnecessary abstraction is a surface area increase you did not consent to.

The patterns I found in my own code

Premature abstraction: a UserRepository class with an interface, a factory, and dependency injection — for an app with one user type and one database. Phantom error handling: try-catch blocks around operations that cannot throw, with custom error classes that log to a monitoring service I do not have. Unnecessary dependencies: three npm packages imported for functionality that exists in the standard library. Config sprawl: environment variables for twelve settings, nine of which have hardcoded defaults that will never change. Every line compiled. Every line was unnecessary.

Why is plausible code more dangerous than buggy code?

AI does not write bad code. It writes plausible code that does more than you asked. That is worse than a bug, because bugs fail loudly. Over-engineered code passes every check — tests, linting, type safety, code review by anyone who is not paying close attention. It looks professional. It looks thorough. It looks like exactly the kind of code that gets approved in a pull request and maintained for years.

The question is not "does it work?"

The real review question for AI-generated code is not "does it work?" but "should this exist?" Before merging any AI output, apply this checklist: Does every file serve a stated requirement? Does every abstraction have more than one consumer? Does every error handler catch an error that can actually occur? Does every dependency replace more code than it adds? If the answer to any of these is no, delete it. The code that should not exist is more expensive than the code that does not work.

How DriftLess catches this before generation

DriftLess prevents over-engineering at the source. Scope locking defines what the session will produce — and implicitly, what it will not. When the AI starts generating abstractions beyond the stated scope, drift detection flags it before the code exists. The reviewer agent evaluates output against your goal, not against "best practices" that may not apply to your context. The result is code that does what you asked, nothing more.

Ship code that should exist. Delete the rest.

5 sessions free. $0 AI markup. No card required.

Start building free

Sources

AI Code Quality Report — December 2025→

CodeRabbitOfficial DocsAccessed 2026-03-26

State of Software Security — Generative AI Edition→

VeracodeOfficial DocsAccessed 2026-03-26

Engineering Benchmark Report Q1 2026→

CortexOfficial DocsAccessed 2026-03-26

2025 Developer Survey — AI Tools in Development→

Stack OverflowSurveyAccessed 2026-03-26

Insights

Vibe coding is real — here's how to do it without wrecking your project

Vibe coding took over in 2025. The data on what it actually produces is now in — and the fix is not to stop, but to constrain.

Product

My AI built 500 lines I didn't need. Now I have a score for that.

Test coverage tells you if code works. A Drift Score tells you if the code should exist at all.

Research

Why AI agents keep hallucinating features you didn't ask for

LLMs don't just hallucinate facts — they hallucinate features. The training incentive is completionism, and the result is scope creep at machine speed.

← All posts