What happens when you use AI not to ship faster, but to build better? I tracked 424 commits over 11 weeks to find out.
The Experiment
Context first: I'm an engineering manager, not a full-time developer. These 424 commits happened in the time I could carve out between meetings, planning, and leadership work. The applications are production internal systems (monitoring dashboards, inventory management, CLI tools, chatbot backends) used by real teams, but not high-criticality systems where a bug directly impacts external customers or revenue.
Important nuance: I also act as Product Manager for the Platform team that owns these applications. This means I'm defining the problems and implementing the solutions. There's no friction or information loss between problem definition and implementation that typically exists in stream-aligned teams where PM and developers are separate roles. This setup favors faster iteration and tighter feedback loops (though it's worth noting this isn't representative of how most teams operate).
From November 2025 to January 2026, I wrote 424 commits across 6 repositories, spanning 44 active days (with Christmas holidays in the middle). Every single line of code was written with AI assistance: Cursor, Claude Code, the works. These weren't toy projects or weekend experiments. These were real systems evolving under active use.
The repositories varied wildly in maturity: from a 13-day-old Go service to a 5.6-year-old Python system with over 12,000 commits in its history. Half were greenfield projects under 6 months old; half were mature codebases years into their lifecycle. Combined, they represent ~107,000 lines of production code. These are small-to-medium projects. That's how our platform team works: we prefer composable systems over monoliths.
The period was intense: 9.6 commits per day average, almost double my historical pace. But AI didn't just make me faster at writing code. It fundamentally changed what kind of code I wrote.
I tracked everything. Every commit was categorized using a combination of commit message analysis, file change patterns, and manual review. Claude Sonnet 4.5 helped automate the initial categorization, which I then validated. And when I analyzed the data, I found something I wasn't expecting.
The Balance
For every hour I spent on new features, I spent over four hours on tests, documentation, refactoring, security improvements, and cleanup.
22.7% functionality. 98.3% sustainability.
Yes, that adds up to more than 100%. That's not an error: it's the reality of how development actually works. When I develop a feature, the same commit often includes tests, documentation updates, and code cleanup. The numbers reflect that commits are multidimensional, not mutually exclusive categories.
The ratio: 0.23:1 (Functionality:Sustainability)
This wasn't accidental. This was a deliberate experiment in sustainable velocity. And AI made it possible.
Breaking Down the 98.3%
![]() |
| 8-Dimensional Commit Categorization |
When I say "sustainability," I mean 8 specific, measurable categories:
- Tests: 30.7%: The largest single category
- Documentation: 19.0%: READMEs, API docs, inline comments
- Cleanup: 13.8%: Removing dead code, unused features, simplification
- Infrastructure: 12.0%: CI/CD, scripts, tooling improvements
- Refactoring: 11.5%: Structural improvements, better abstractions
- Configuration: 8.1%: Environment variables, settings, build configs
- Security: 3.2%: Vulnerability fixes, security audits, input validation
These aren't "nice-to-haves." They're the foundation that makes the 22.7% of new functionality actually sustainable.
What Changed (And What Didn't)
Here's what I learned: tests and feedback loops were always important. Good engineers always knew this. The barrier wasn't understanding, it was economics and time.
What was true before AI:
- Fast feedback loops were critical for velocity
- Comprehensive tests enabled confident iteration
- Documentation reduced knowledge silos
- Some teams invested in this, many didn't grasp that sustainable software requires sustained investment in technical practices
What changed with AI:
- The barrier to entry dropped dramatically
- Building that feedback infrastructure became fast
- Maintaining quality became economically viable for small teams
- The excuse of "not enough time" largely disappeared
What didn't change:
- Discipline is still our responsibility
- The choice to balance features vs sustainability is still ours
- AI doesn't automatically make us write tests: we have to choose to
- The default behavior is still "ship more features faster" until technical debt forces a halt
The insight: AI removed the last excuse. Now it's about discipline, not capability.
For me, as a manager who codes in limited time, this changed everything. I can afford to build the feedback infrastructure that lets me iterate fast. The 0.23 ratio isn't a constraint, it's what enables the velocity I'm experiencing.
Negative Code: Simplification as a Feature
Here's another data point: 55,407 lines deleted out of 135,485 total lines changed.
That's 40.9% deletions. For every 3 lines I wrote, I deleted 2.
Some deletions were refactoring: replacing 100 lines of messy code with 20 clean ones. But many were something else: removing features that didn't provide enough value.
One repository, chatcommands, has net negative growth: the codebase got smaller despite active development. It's not alone. ctool also shrank during this period.
This connects to two concepts I've written about before:
Basal Cost of Software: Every line of code has an inherent maintenance cost. It needs to be understood, tested, debugged, and updated. The best way to reduce basal cost is to have less code.
Radical Detachment: Software is a liability to minimize, not an asset to maximize. The goal isn't more code, it's the right amount of code to solve the problem.
Before AI, deleting features was expensive:
- Understanding old code took hours (documentation outdated)
- Tracing dependencies was manual and error-prone
- Verifying nothing broke required incomplete test suites
- Updating docs and configs was tedious
Features became immortal. Once added, they never left, even at zero usage.
With AI, deletion becomes viable:
- Trace dependencies in minutes, not hours
- Comprehensive tests catch breaking changes immediately
- Documentation updates happen alongside code changes
- The entire deletion commit includes proper cleanup
The 13.8% cleanup category isn't just removing dead imports. It's removing dead features. Entire endpoints. Unused UI components. Configuration options nobody sets.
I call this Negative Velocity: making the codebase smaller, simpler, and faster, not just adding more.
This aligns with lean thinking about waste elimination. Every unused feature is waste: it increases build times, slows down tests, complicates mental models, and raises the basal cost of the system. Each line of code creates drag on everything else. By deleting features, we're not just cleaning up: we're reducing the ongoing cost of ownership. Fewer features means faster comprehension, simpler debugging, easier onboarding, and less surface area for bugs.
I'd deleted code before, but AI reduced the friction enough to make it routine instead of occasional. Deletion went from expensive to viable. We can finally afford to minimize the liability at the pace it deserves.
The best code is no code. Now we can actually afford to delete it.
The Metrics at a Glance
The key numbers:
- 424 total commits across 44 active days (November 2025 - January 2026)
- 9.6 commits per day average: nearly double typical velocity
- Ratio Func:Sust = 0.23:1 (1 hour features, >4 hours sustainability)
- Average Functionality: 22.7% per commit
- Average Sustainability: 98.3% per commit (multidimensional, not mutually exclusive)
- 135,485 total lines changed (80,078 insertions, 55,407 deletions)
- 40.9% deletion ratio: for every 3 lines written, 2 deleted
These aren't aspirational numbers. These are the actual patterns from an intensive 11-week period of AI-assisted development in production repositories.
Different Projects, Different Profiles
Not every project should have the same ratio. Context matters.
- inventory: 0.42:1 ratio: More feature-focused, greenfield project in active development
- plt-mon: 0.25:1 ratio: Test-heavy, mature monitoring system needing reliability
- ctool-cli: 0.16:1 ratio: CLI tool with emphasis on tests and robustness
- chatcommands: 0.15:1 ratio: Maintenance-focused, net negative code growth (-1,809 lines)
- ctool: 0.09:1 ratio: Minimal feature work, heavy focus on infrastructure and cleanup
- cagent: 0.13:1 ratio: New project with emphasis on quality from day one
The chatcommands profile is particularly interesting: 31.5% of effort went to cleanup, and the repository actually shrank by 1,809 lines over this period. This isn't a dying project, it's a maturing one. Features were removed intentionally because they weren't providing value. The codebase got simpler, faster, and more maintainable.
The plt-mon repository maintains a 1.15:1 test-to-feature ratio: tests slightly outpace features. This is a production monitoring system where reliability matters, and the balance reflects steady feature growth with corresponding test coverage.
The ratio should reflect the project's phase and needs. AI makes all of these profiles viable without sacrificing quality or velocity.
What I Learned
After 11 weeks and 424 commits, here's what I've discovered:
Real velocity comes from fast feedback loops. Not from writing code faster, but from being able to iterate confidently and quickly. The 98.3% investment in sustainability isn't overhead, it's what enables speed.
AI changed what became economically viable. Before, building comprehensive test coverage as a manager with limited coding time would have been impossible. Now I can afford to build both the features and the safety net at sustainable pace. The barrier dropped; the discipline remains my responsibility.
Speed ≠ Velocity. Speed is how fast you move. Velocity is speed in the right direction. A team shipping 10 features per week with zero tests is moving fast toward a rewrite. A team shipping 3 features per week with comprehensive test coverage is moving fast toward sustainability.
What you optimize for gets amplified. My hypothesis: AI amplifies our choices. If you optimize for feature velocity, you'll accumulate technical debt faster. If you optimize for sustainable velocity (balancing features with quality infrastructure) you'll build healthier systems faster. I've seen this play out in my own work, though I don't claim this is universal.
Deletion is a feature. With lower barriers to understanding and changing code, we can finally afford to make codebases smaller. Net negative growth isn't stagnation, it's maturity.
The right ratio depends on context. My 0.23:1 ratio works for internal systems with moderate criticality, developed by a manager in limited time. Your context is different. The point isn't to copy my numbers, it's to be intentional about the balance.
This is still an experiment. I don't know if this approach scales to all teams or all types of systems. What I do know: for my context, over these 11 weeks, this balance produced the fastest sustainable velocity I've experienced in my career.
The shift wasn't learning new practices—I'd practiced TDD and built for sustainability for years. But as a manager coding in limited time, I always had to compromise. I wrote tests, but not as many as I wanted. I refactored, but not as thoroughly. I documented, but not as completely. AI didn't change what I valued—it changed what I could afford to do. The discipline I'd always practiced could finally match the standard I'd always wanted.
Your Turn
I don't have universal answers. But I do have a suggestion:
Measure your balance. Be intentional about it.
Track your next month of commits. Categorize them honestly. Calculate your Functionality:Sustainability ratio.
The number itself matters less than the awareness. Are you making conscious choices about where AI velocity goes? Are you building the feedback infrastructure that enables sustainable speed? Are you just shipping faster, or are you building better systems faster?
For me, the answer has been clear: investing heavily in tests, documentation, and simplification has made me faster, not slower. The 98.3% isn't overhead, it's the engine.
Your mileage may vary. Your context is different. But the question is worth asking:
What kind of engineering does AI make viable for you that wasn't before?













