Hattie's Visible Learning Research, and Reading Research Critically
Or, don't let someone tell you that the jigsaw method is the 7th most important influence on student achievement
I had a PD day today and sat through a session on John Hattie's Visible Learning research. The presenter kept mentioning how the “jigsaw method” has an effect size of 1.20, and that jigsaw is an incredibly effective practice and we should all be using it. They were referencing this table of influences on student achievement (link for full table):
I tried to question whether jigsaws are actually that effective and got shut down. This has actually been the second PD day this year that I've had to listen to something about Hattie's research, one from this outside consultant and one from my principal. I'm not totally opposed to using his research but it also doesn't seem like either presenter understood the research very well or was able to engage with critiques of it. I read Hattie's book a while back but I still found that I struggled to articulate my concerns clearly today. I thought it would be a good exercise to put together a blog post explaining what Hattie's work tries to do and what my concerns are with taking his effect sizes at face value. If nothing else I hope it will help me articulate my criticisms more clearly next time. Hopefully this is useful to someone else out there hearing a similar presentation.
Who is John Hattie and why should we care about him?
Hattie is an Australian researcher who wrote a summary of education research called Visible Learning in an attempt to identify the practices that have the largest impact on student learning. Lots and lots of researchers have conducted research studies on education. Sometimes, researchers do a "meta-analysis," which means they look at a large group of studies and try to summarize the research on a topic — for instance, they might look at the hundreds of studies on homework and try to summarize all of that research in a tidy little paper. Hattie wrote a "synthesis,” which is a summary of meta-analyses. It would be overwhelming to try to capture every detail from all of those meta-analyses so Hattie uses something called an "effect size" to communicate how much impact a practice has on student achievement. It sounds pretty impressive at a glance. "The Visible Learning research syntheses findings from 1,400 meta-analyses of 80,000 studies involving 300 million students, into what works best in education."
What is an effect size?
This is a normal distribution:
Some things in the world are normally distributed, lots of things are close to a normal distribution, and plenty of stuff isn't normally distributed at all but we pretend it is because it's convenient. In a normal distribution about 68% of data points are within one standard deviation of the mean, 95% are within two standard deviations, and 99.7% are within three standard deviations. The standard deviation is a way of measuring how far something is from the mean, so we can use them to compare changes in different contexts. An effect size answers the question, "how many standard deviations did the average individual change in this study compared to a control group?" An effect size of 1 means that someone at the 50th percentile moved to about the 84th percentile. This doesn't meant that every individual moved exactly one standard deviation, and because of the way standard deviation works it doesn't mean that everyone moved by exactly 34 percentiles. It's just an average, a way to compare the size of a change between different contexts.
Is there a certain effect size that is equivalent to a year of learning?
No. That's ridiculous. Hattie found that the average effect size of the studies he looked at was 0.4, so he suggested that 0.4 might be equivalent to a year of learning. Lots of other people have tried to pick a number that they can use to represent a year of learning when looking at research. It's really because people like to say "equivalent to 1.5 years of learning" rather than "effect size of 0.6" since it sounds nicer. You can't measure years of learning like that. If you're going to use effect sizes it's best not to think of them in terms of years of learning at all.
Ok but what is an effect size really measuring?
Here's the thing: the meaning of the effect size depends on what the study is measuring. Hattie's largest effect sizes are over 1, with a lot of desirable effects in the 0.6 to 0.7 range. In 2021 the mean SAT score was 1060, with a standard deviation of 217. Even just 0.6 standard deviations is 130 points on the SAT. If anyone comes up with an intervention that increases the average student's SAT score by 130 points that would be an educational miracle.
Another example: when I was student teaching I taught at a summer school, and we were required to give the test we would give at the end of the summer at the beginning as well. The goal was to gather data to show how much students had grown. My students' starting average was 31%. The ending average was 82%. The standard deviation was 15%. That's an effect size of 3.4. Did I teach students 8 years of math as a student teacher in summer school? No. I just gave students a test full of questions they didn't know how to do, and then taught them how to do them. Most studies use some sort of measure in between these two. SAT scores are really hard to influence. Making up your own test is really easy to influence.
There are lots of other reasons effect sizes are hard to compare. It's easier to get a large effect in a small sample size. Many studies look at a specific subgroup rather than a representative sample of students. Some studies don't use control groups. But the point is that an effect size of 0.6 could be incredibly impressive in one context, and disappointing in another. Studies are hard to compare. Effect size is one way to do it, but it's far from perfect.
Can you give some examples of how those effect sizes can be misleading?
Hattie says that the "jigsaw method" has an effect size of 1.20. Here is the page on his website about the jigsaw method. There is only one meta-analysis cited. It was conducted in Turkey, looking at Turkish studies on the jigsaw method run between 2005 and 2012. They only found 11 relevant studies, mostly conducted at middle schools or universities. The meta-analysis does not mention what the outcome measures were that resulted in such a high effect size, and I got tired of trying to translate different Turkish PDFs. I'm skeptical; I'd like more than 11 Turkish studies with unclear outcome measures to believe that the jigsaw method is the 7th most powerful educational technique in existence.
Second example: Hattie says that "mnemonics" have an effect size of 0.76. Does teaching mnemonics help students make two years of progress in a single year? Of course not. Hattie's website references four meta-analyses. Three range from effect sizes of 0.45 to 0.64, but the fourth has an effect size of 1.62. That's not a great start. Looking at that fourth meta-analysis, it focuses on students with disabilities and acknowledges that many of the studies were run with individual students being shown a list of words to remember, with mnemonics and without. The review does include some classroom studies and claims significant effect sizes as well. It's an interesting paper, I honestly think it's worth reading. But again, the paper doesn't share what exactly the assessments were measuring. If, like the studies with individual students, they showed a large effect size for remembering words from a list, that's very different from influencing regular classroom instruction by the same magnitude. More broadly, any study of mnemonics is going to focus on topics that lend themselves to mnemonics — the studies in that last review focused largely on history, and used really interesting and memorable images. Those choices bias the results and inflate the effect size. If some random teacher reads "oh mnemonics will help my students make two years of progress" and throws some mnemonics onto a slideshow they shouldn't expect to see much of a difference. I’m pretty skeptical of mnemonics in general. Looking through these studies made me curious; maybe I dismissed mnemonics too quickly. But the inconsistency of the effect sizes, questions about whether they work in all contexts, and questions about what outcome the effect sizes are measuring mean that I should be hesitant to adopt the practice without understanding it better.
What are some other problems with Hattie's approach?
Hattie takes all the studies on a topic and averages them together, while a lot of research is trying to figure out the best way to implement a particular strategy. A good example is Kluger & DeNisi's 1996 meta-analysis on feedback. They found that the average effect size of studies on feedback was 0.42. But the surprising part is that one-third of studies had a negative effect size. That means one-third of the time, when a study tried to figure out if a type of feedback helped, it actually hurt. One-third! Rather than asking, "what is the effect size of feedback?", a better question to ask is "which feedback strategies are more effective, and which feedback strategies are less effective?"
How should I use Hattie's work without misinterpreting it?
Part of the problem is that deciding "what works" in education is hard. It's hard to compare the research on different influences on student achievement. Effect sizes are a reasonable place to start, and Hattie's Visible Learning website can serve as something like a reference library of research. And sure, something that has a higher effect size is probably a better bet than something with a lower effect size. But no teacher should look at Hattie's list, pick a few things at the top, and assume they will help their students learn two or three times as fast. Instead, it's worth using Hattie's work as a helpful collection of research to explore. Most schools give homework. Hattie gives homework an effect size of 0.29. It's reasonable to guess that homework can help students learn, but isn't the most important thing to focus on. Then you could look at some of the specific studies on his website and try to understand how to use homework effectively. (You could also read Michael Pershan’s great homework lit review.) Hattie has said that teachers and school leaders shouldn't just look at his table, pick the stuff at the top, and blindly start using it. I doubt he intended his work to become as widely referenced as it is. It's a reasonable place to start, but it's not where the conversation should end.
Really good read. I appreciate you digging into the studies themselves. I think Hattie’s work is fascinating, but not perfect and it does require a practitioner’s approach. Here’s my take on the 12 things I’d want to see in classrooms based on the Visible Learning work:
Exactly what I was looking for. Thank you for writing this. Looking forward to rereading it again soon hopefully.