Does AI Really Make Coders More Productive?

A Deep Dive into a Massive Study

Aug 20, 2025

Introduction

Have you wondered if AI tools are actually helping programmers get more done, or if it’s all just hype? I mean, we’ve got things like ChatGPT, Cursor, Windsurf, Claude Code, Gemini, Aider, and others that can write code for you. Sounds awesome, right? But is it really boosting productivity, or could it sometimes make things worse? In my experience, the coding results can be quite impressive, but AI goes off the rails often enough that I use “git commit” to create recovery points. A year ago, I lost 3-weeks of “amazing progress” until it all went horribly wrong. Twice, I’ve had to hire developers to finish a project or untangle it. We are hearing more about AI best practices nowadays, in order to prevent or minimize the impact of these problems. For those that use AI coding assistants, I thought this was an important topic—and from Stanford University it lends fair credibility.

I recently came across this fascinating YouTube video from a talk at the AI Engineer World’s Fair. It’s called “Does AI Actually Boost Developer Productivity? (100k Devs Study)” by Yegor Denisov-Blanch, a researcher at Stanford University (here). The video breaks down a huge study they did on nearly 100,000 developers from over 600 companies. They looked at real coding data from private projects (not just public repos on GitHub) to see if AI assistants are living up to the buzz. Spoiler: The results are mixed. In this post, I’ll walk you through the key findings, any issues with the study, ways AI coding could improve, and what it all means for us as we witness the Beast System continue to rise. Remember it is moving so fast that these backward-looking studies are not super relevant. What I will continue to post on are AI software development patterns where there is no human in the loop. Coding, fixing, testing, deployments are capable of working in parallel using AI Agents nearly continuously. The pace of change is already nearly vertical!

The Main Results: What Did the Study Find?

The study is all about measuring “productivity” in coding. But what does that even mean? It’s not just how many lines of code you write—that could be junk code. Instead, the researchers built a smart model that analyzes changes in code (from Git, which is like a version control system for projects). They score it based on quality, how easy it is to maintain, and what actual functionality it adds. They also track “rework,” which is when you have to go back and fix stuff soon after writing it—basically, wasteful tweaks that don’t add real value. This happens to me regularly.

The big headline? On average, AI coding assistants give developers a 15-20% productivity boost across industries. That’s solid—imagine finishing your work 15-20% faster! But it’s not the same for everyone. Claims that developers see a ten fold (10x) boost in productivity are not very contextually helpful. Some teams saw huge jumps, while others actually got less productive. Why? It depends on a few key factors.

First, task complexity and project type play a massive role. “Greenfield” projects are brand-new ones, like starting a fresh app from scratch. “Brownfield” are existing projects where you’re adding to or fixing old code. For simple, low-complexity greenfield tasks, AI can boost productivity by 30-40%. It’s great at spitting out boilerplate code (the boring, repetitive stuff). But for high-complexity tasks in those new projects, it’s only 10-15%. Switch to brownfield: Low-complexity gets 15-20%, but high-complexity brownfield tasks? Just 0-10%, and sometimes it even decreases productivity. The study pulled this from 136 teams across 27 companies, so that’s a strong sample size.

Why the drop in tough, existing projects? AI often generates code that needs fixing later, leading to more rework. In one example from a 120-developer team, output went up after adopting AI, but a chunk of it was just patching mistakes the AI introduced. Net gain: still positive, but not as big as it looks.

Second, the programming language matters a lot. AI shines with popular ones like Python, Java, JavaScript, or TypeScript—gains of 20% for simple tasks and 10-15% for complex ones. These languages have tons of training data for AI models. But for niche or old-school languages like Cobol, Haskell, or Elixir? Gains are tiny or even negative, especially on hard tasks. AI just isn’t as smart there because it hasn’t “seen” as much code in those languages. I think this is one of the most important opportunities. It should be able to shard Mixture of Experts (larger models) into smaller experts distilled in a specific language. I’d pay for that!

Third, codebase size is a game-changer. Bigger projects with massive amounts of code? AI struggles because of “context window” limits—basically, how much info the AI can process at once. In huge codebases, it can’t grasp the full picture, leading to more errors and less help. Smaller codebases let AI perform better, or when broken up into smaller chunks. Overall, the study says AI is replacing some coders. But humans will still have a role, especially at the senior level. Think of AI as more like a Junior developer that helps in specific less rigorous portions of the code with a senior dev looking over all the merge requests.

Top 5 Issues Found

No study’s perfect, and this one got some heat in online discussions (like on Reddit and Hacker News). People pointed out flaws that could make the results less reliable. I’ll group them into the top 5, ranked from most serious to least (based on how often they came up and their potential impact).

Lack of Full Transparency on Methodology: Mr. Denisov-Blanch doesn’t share the exact details of how their productivity model works—no full paper or preprint released yet. Critics say this makes it hard to verify or replicate. Plus, the researcher has a business background, not coding experience, which raises questions about biases.
Potential Confounders Not Fully Controlled: With 100k devs, it’s huge, but things like team experience, company culture, or exact AI tools used might skew results. For example, newer AI models (like Claude Opus 4) weren’t included, so the study might already be outdated as tech evolves fast. This is true—models are getting way better at coding recently.
Overemphasis on Quantitative Metrics, Missing Qualitatives: The study focuses on hard numbers like rework, but ignores softer benefits—like how AI reduces mental fatigue or lets coders focus on big-picture ideas. Devs might feel happier or less burned out, which could indirectly boost long-term productivity. But I argue the opposite. It’s been my experience that development speed is so high that it takes inordinate amounts of concentration to move through the flow of the prompts. There’s a point after 5-6 hours straight where I have to break for the day. It is very demanding when developing complex projects.
Possible Sponsorship Bias: Some folks wondered if sponsors (though not named) influenced the findings, as the researcher has made bold claims before (like 9% of devs “do nothing” based on commits). Big studies like this often have funding ties.
Reliance on Git Data Limits Scope: Not all coding happens in Git, and the model might miss non-code work like planning or debugging outside commits. Plus, surveys were dismissed as unreliable, but maybe combining them could give a fuller picture.

These aren’t deal-breakers, but they remind us to take even “big data” studies with a grain of salt.

How Could AI Coding Get Better?

The study and discussions highlight ways to make AI assistants more effective. First, beef up context handling. Current AIs have limited “memory” for big codebases—future models need bigger context windows or smarter ways to summarize projects. Companies like OpenAI are already working on this. Gemini has a 1M token context, which is very significant. Then that opens a “needle in the haystack” problem, where AI concentrates on the first part, and the ending part of the context, often losing focus in the middle. When this happens, AI is no longer great at tasks. It has to be “steered” by the developer in order to keep forward progress possible.

Second, train AI on more diverse languages and tasks. Right now, it’s biased toward popular languages with lots of adoption and users. Feeding models more data on niche or legacy languages (like COBOL) could close the gap.

Third, focus on reducing rework. AI should get better at generating bug-free code, maybe by integrating testing tools automatically. Also, teach users better prompting—like giving clear instructions to avoid vague outputs. I’m looking into The BMAD method to make a controlled leap into full Agentic coding (here). Wish me luck!

Fourth, blend AI with human skills. Don’t rely on it for everything; use it for boilerplate or research, but build your own understanding to avoid dependency. Schools could add AI ethics and effective use to coding classes.

Finally, companies should adopt AI thoughtfully—start small, measure real impacts, and train teams. Patience is key; gains might take time as tools improve. Don’t just FOMO.

Key Takeaways: The Big Three

AI Boosts Productivity, But It’s Contextual: Expect 15-20% average gains, but way more for simple, new projects in common languages—and less (or none) for complex, older ones.
Watch Out for Hidden Costs Like Rework: AI can speed up initial coding but create fixes later, so net benefits depend on the task.
Future Improvements Are Coming—And We Can Help: By fixing context limits and biases, AI could get even better.

Conclusion

In summary, this Stanford study cuts through the AI hype: Yes, tools like coding assistants can make developers 15-20% more productive on average, especially in straightforward scenarios with popular languages and smaller codebases. But for tricky, established projects or rare languages, the gains shrink—or vanish—due to errors and rework. Faults like unclear methods and missed qualitatives mean we shouldn’t take it as definitive progress, but it’s a solid start. With tweaks like better context and training, AI could evolve into an even stronger ally.

There’s one more note on this. I have been able to code projects in different technical stacks, languages, and methods that I was never formally trained on. It is incredibly empowering because I want the best standards and best practices period. I don’t have to be constrained by my own personal or professional experience. That is a game changer.

#Maranatha

YBIC,

Scott

Scott Townsend

Discussion about this post