Teachers across the U.S. will start returning to work in the next few weeks. Maybe you'll return soon. And when you return, maybe you'll hear from a colleague about how you just have to try this AI tool. Or maybe it'll be a consultant telling you this is what you have to do to avoid being left behind. Here's a fun new piece of evidence to consider if that happens to you.
An organization called Model Evaluation and Threat Research (METR) did a study on how AI helped coders with productivity. They recruited a bunch of experienced coders and had them prepare a list of real tasks they needed to complete, with an estimate of how long they thought each task would take. Each task was randomly assigned to be done either without AI, or with AI allowed.
Long story short: using AI slowed coders down. On average, they finished tasks faster when AI wasn't allowed. Probably not what you expected! File this under "maybe AI isn't changing the world as much as some people think." This is just one study and it has its limitations, though it's also rigorous experimental evidence. You can read a more thorough description here if you're interested.
But there's an important note about the study that's worth emphasizing, especially when that eager colleague of yours tells you how amazing AI has been planning their lessons or whatever. Coders completed tasks 19% slower (on average) when they were allowed to use AI. But the study also asked coders to self-evaluate whether they thought AI made them faster or slower. On average, when coders were using AI, they thought they were going 20% faster. Let me say that again. Coders thought AI was making them faster, but it was actually making them slower. The only reason the researchers know it slowed coders down is they screen-recorded all of the work and timed everything themselves.
The truth is, humans aren't very good at metacognition. We aren't very good at recognizing what we're good at and bad at. We aren't very good at judging our own productivity. We aren't very good at judging our own learning. It would be nice if we were, but we're not.
And when it comes to AI we are constantly being fed an enormous hype machine trying to convince us that AI is the future, that the clock is ticking on regular analog humans, that we need to hop on board or fall behind. That hype machine biases us to assume that AI is, in fact, intelligent, and that whatever it outputs must be better than what we would’ve produced ourselves.
This is just one study on coding. It doesn’t follow that using AI is always bad. I wrote a post a few months ago on using AI to generate word problems. I plan to continue doing that this school year. I use AI to translate when I’m communicating with families that don’t speak English. But that’s probably it. I’ve experimented with generating materials beyond word problems and I just don’t think it’s producing quality resources. If I give AI a quick prompt and use what pops out, it’s typically not very good. If I spend a while prompting again and again to get something better, I might as well write it myself.1 Everything else I’ve tried to use AI for follows the same pattern: it’s either bad, or too much work to make it good. Word problems are my one exception because they are so labor-intensive to write, and when I write word problems they’re often too repetitive. Make your own decisions. I’ll keep experimenting and I’m sure I’ll find some new uses as time goes on. But remember that it’s easy to fall for something new and shiny, and convince yourself that the newness and shininess means it must be better than what you did before.
My argument is that we should approach AI use with intense skepticism, and we should reserve extra intense skepticism for any suggestion that we should have students use AI. This post is a reminder of our biases, not an ironclad rule. If you head back to work and that eager colleague or consultant or whoever tells you about this amazing AI tool that you absolutely have to use, I hope you remember this little nugget and approach it with healthy skepticism.
Craig Barton has a nice post on creating resources with AI here. It’s worth reading if you want to give AI-created resources a lot. But Craig emphasizes that it often takes multiple rounds of prompts and edits before you get something useful, which maybe just reinforces my thesis. Still, Craig’s approach is the best I’ve seen and it’s worth checking out.
Thanks for drawing attention to this study. Too many studies on use of AI rely on self-assessment of whether it helped.
Even as recently as a few months ago, I was excited about ChatGPT's ability to expedite some of my less-desired writing tasks — it took me an oddly long time to realize I was taking more time to correct it.
I still find uses for it as a thinking partner (which of course is part of writing), but color me a bit disillusioned.