Wait a minute — developers who use genAI tools are slower?
According to Nvidia CEO Jensen Huang, thanks to AI, “The world will be more productive. There will be higher GDP. There will be more jobs. But every job will be augmented by AI.”
And according to Doug Matty, the US Department of Defense’s Chief AI Officer, the generative AI (genAI) program Grok can help us guard against our enemies, and “maintain strategic advantage over our adversaries.”
Yeah. Right.
Sure, the genAI revolution has been very good for Nvidia, which now has $4 trillion in value and is growing. It’s also fine and dandy for Grok that — only days after Elon Musk’s prize AI program went full MechnaHitler, it landed a $200 million contract.
But what about the people who, you know, actually have to use AI tools to get work done? Funny thing that, it’s not working out so well for them.
Take, for example, programming. That’s one of the things AI is supposed to be really good at. Microsoft CEO Satya Nadella confidently asserts, “that Copilot is transforming the developer experience,” and it now contributes to over 30% of Microsoft’s codebase.“
Does it, really? A recent study showed that, for experienced open-source developers, genAI tools made them less effective, not more. The study, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, was conducted by the nonprofit AI research group METR. It involved 16 experienced open-source developers, averaging more than 10 years of experience. In short, these were experts set to working on genuine issues like bug fixes and features in projects they knew well in familiar code repositories.
Using such genAI tools as Cursor Pro and Claude 3.5/3.7 Sonnet, the programmers presumed that with AI’s help, they’d reduce their task time by 24%. Even after finishing the study, they believed it had sped them up by 20%. They were wrong, badly wrong.
In reality, it took them 19% longer to complete tasks compared to their compadres who hadn’t used genAI at all. Tasks expected to take roughly six hours or less showed the most pronounced slowdown, with AI-assisted work slowing completion times significantly.
What happened? First, the developers burned time prompting genAI. One of the things people praising the technology to high heaven overlook is that writing good programming prompts is hard. As Domenic Denicola, a Google Chrome developer and the jsdom maintainer who was in the study, wrote, he was surprised at just “how bad the models are at implementing web specifications.“
Developers also wound up spending more time than expected reviewing generated code and fixing errors, including — surprise, surprise! — potential security holes.
Why are we so sure genAI makes things faster? Well, never forget that genAI companies are usually selling efficiency and productivity. The faster and more efficient they appear, the more likely you are to buy them. Leaving advertising hype aside, the researchers also noted that “While coding/agentic benchmarks have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency.” This benchmarking, however, was all about real work.
This is not the first study to reach these conclusions. Google’s 2024 DevOps Research and Assessment (DORA) study found that while genAI coding tools increased code review speed, their results tended to be too poor to be pushed out to production. Teams using the tech also reported more errors. In short, the programmers had to burn time cleaning up the bots’ messes.
Over on the Reddit/programming site, none of this surprised the developer regulars who hang out there. The most popular post started, “My experience is [AI] can produce 80% in a few minutes, but it takes ages to remove duplicate code, bad or non-existent system design, and fixing bugs. After that, I can finally focus on the last 20% missing to get the feature done. I’m definitely faster without AI in most cases.”
Sure, for fast and dirty code, genAI can deliver a program that might work. The problem is that it’s not enough for working software.
It’s also not good enough for beginning AI-first “vibe” coders. As the technical writer Kaustubh Saini recently wrote, after a catastrophic vibe programming failure, “the core problem with vibe coding: it produces developers who can generate code but can’t understand, debug, or maintain it. When AI-generated code breaks, these developers are helpless.”
Yes, they are.
It’s the exact same problem in other fields. It’s just not quite as obvious in other areas.
I see this constantly in writing. There are more and more reports, stories, and news articles riddled with errors and badly written. For now, companies and second-rate publishers are getting away with producing genAI slop. People don’t check for mistakes, and the work can sound plausible if you don’t know any better.
It’s almost certainly true in whatever field you’re working in, as well. The long and short of it is that genAI, at best, can be a useful aid to getting work done. Never, ever mistake it as a replacement for real expertise. It’s not. When everything is vibing – ComputerworldRead More