When we announced our research results last week, Audrey Watters was one of the first to cover it. Shortly thereafter, Justin Reich wrote a very thoughtful review of our research and response to Audrey’s blog post at his EdTechResearcher blog. Others, through comments made in post comments, blogs, emails, and conversations, have asserted that we (Auburn School Department) have made claims that our data don’t warrant.
I’d like to take a moment and respond to various aspects of that idea.
But first, although it may appear that I am taking on Justin’s post, that isn’t quite true (or fair to Justin). Justin’s is the most public comment, so the easiest to point to. But I actually believe that Justin’s is a quite thoughtful (and largely fair) critique from a researcher’s perspective. Although I will directly address a couple things Justin wrote, I hope he will forgive me for seeming to hold up his post as I address larger questions of the appropriateness of our claims from our study.
Our Research Study vs. Published Research -
Our results are initial results. There are a lot of people interested in our results (even the initial ones – there are not a lot of randomized control trials being done on iPads in education), so we decided to share what we had so far in the form of a research summary and a press release. But neither of these would be considered “published research” by a researcher (and we don’t either – we’re just sharing what we have so far). Published research is peer reviewed and has to meet standards for the kinds of information included. We actually have more data to collect and analyze (including more analyses on the data we already have) before we’re ready to publish.
For example, Justin was right to point out that we shared no information about scales for the ten items we measured. As such, some of the measures may seem much smaller than when compared proportionally to their scale (because some of the scales are small), and we were not clear that it is inappropriate to try to make comparisons between the various measures as represented on our graph (because the scales are different). In hindsight, knowing we have mostly a lay audience for our current work, perhaps we should have been more explicit around the ten scales and perhaps created a scaled chart…
Mostly, I want my readers to know that even if I’m questioning some folks’ assertions that we’re overstating our conclusions, we are aware that there are real limitations to what we have shared to date.
Multiple Contexts for Interpreting Research Results -
I have this debate with my researcher friends frequently. They say the only appropriate way to interpret research is from a researcher’s perspective. But I believe that it can and should also be interpreted as well from a practitioner’s perspective, and that such interpretation is not the same as a researcher’s. There is (and should be) a higher standard of review by researchers and what any results may mean. But practical implementation decisions can be made without such a high bar (and this is what makes my researcher friends mad, because they want everyone to be just like them!). This is just like how lawyers often ask you to stand much further back from the legal line than you need to. Or like a similar debate mathematicians have: if I stand some distance from my wife, then move half way to her, then move half way to her again, and on and on, mathematicians would say (mathematically) I will never reach her (which is true). On the other hand, we all know, I would very quickly get close enough for practical purposes!
Justin is very correct in his analysis of our research from a researcher’s perspective. But I believe that researchers and practitioners can, very appropriately, draw different conclusions from the findings. I also believe that both practitioners and researchers can overstate conclusions from examining the results.
I would wish (respectfully) that Justin might occasionally say in his writing, “from a researcher’s perspective…” If he lives in a researcher world, perhaps he doesn’t even notice this, or thinks it implied or redundant. But his blog is admittedly not for an audience of researchers, but rather for an audience of educators who need help making sense of research.
Reacting to a Lay Blog as a Researcher -
I think Justin has a good researcher head on him and is providing a service to educators by analyzing education research and offering his critique. I’m a little concerned that some of his critique was directed at Audrey’s post rather than directly at our research summary. Audrey is not a researcher. She’s an excellent education technology journalist. I think her coverage was pretty on target. But it was based on interviews with the researchers, Damian Bebell (one of the leading researchers on 1to1 learning with technology), Sue Dorris, and me, not a researcher’s review of of our published findings. At one point, Justin suggests that Audrey is responding to a graph in our research summary (as if she were a researcher). I would suggest she is responding to conversations with Damian, Sue, and me (as if she were a journalist). It is a major fallacy to think everyone should be a researcher, or think and analyze like one (just as it is a fallacy that we all should think or act from any one perspective, including as teachers, or parents, etc). And it is important to consider individual’s context in how we respond to them. Different contexts warrant different kinds of responses and reactions.
Was It The iPads or Was It Our Initiative -
Folks, including Audrey, asked how we knew what portion of our results were from the iPads and which part from the professional development, etc. Our response is that it is all these things together. The lessons we learned from MLTI, the Maine Learning Technology Initiative, Maine’s statewide learning with laptop initiative, that has been successfully implemented for more than a decade, is that these initiatives are not about a device, but about a systemic learning initiative with many moving parts. We have been using the Lead4Change model to help insure we are taking a systemic approach and attending to the various parts and components.
That said, Justin is correct to point out that, from a research (and statistical) perspective, our study examined the impact that solely the iPad had on our students (one group of students had iPads, the other did not).
But for practitioners, especially those who might want to duplicate our initiative and/or our study, it should be important to note that, operationally, our study studied the impact of the iPad as we implemented them, which is to say, systemically, including professional development and other components (Lead4Change being one way to approach an initiative systemically).
It is not unreasonable to expect that a district who simply handed out iPads would have a hard time duplicating our results. So although, statistically, it is just the iPads, in practice, it is the iPads as we implemented them as a systemic initiative.
Statistical Significance and the Issue of “No Difference” in 9 of the 10 Tests -
The concept of “proof” is almost nonexistent in the research world. The only way you could prove something is if you could test every possible person that might be impacted or every situation. Instead, researchers have rules for selecting some subset of the entire population, rules for collecting data, and rules for running statistical analyses on those data. Part of why these rules are in place is because, when you are only really examining a small subset of your population, you want to try to control for the possibility that pure chance got you your results.
That’s where “statistical significance” comes in. This is the point at which researchers say, “We are now confident that these results can be explained by the intervention alone and we are not worried by the impact of chance.” Therefore, researchers have little confidence in results that do not show statistical significance.
Justin is right to say, from a researcher’s perspective, that a researcher should treat the 9 measures that were not statistically significant as if there were no difference in the results.
But that is slightly overstating the case to the rest of the world who are not researchers. For the rest of us, the one thing that is accurate to say about those 9 measures is that these results could be explained by either the intervention or by chance. It is not accurate for someone (and this is not what Justin wrote) to conclude there is no possitive impact from our program or that there is no evidence that the program works. It is accurate to say we are unsure of the role chance played on those results.
This comes back to the idea about how researchers and practitioners can and should view data analyses differently. When noticing that the nine measures trended positive, the researcher should warn, “inconclusive!”
It is not on a practitioner, however, to make all decisions based solely on if data is conclusive or not. If that were true, there would be no innovation (because there is never conclusive evidence a new idea works before someone tries it). A practitioner should look at this from the perspective of making informed decisions, not conclusive proof. “Inconclusive” is very different from “you shouldn’t do it.” For a practitioner, the fact that all measures trended positive is itself information to consider, side by side with if those trends are conclusive or not.
“This research does not show sufficient impact of the initiative,” is as overstated from a statistical perspective, as “We have proof this works,” is from a decision-maker’s perspective.
We don’t pretend to have proof our program works. What is not overstated, and appropriate conclusions from our study, however, and is what Auburn has stated since we shared our findings, is the following: Researchers should conclude we need more research. But the community should conclude at we have shown modest positive evidence of iPads extending our teachers’ impact on students’ literacy development, and should take this as suggesting we are good to continue our program, including into 1st grade.
We also think it is suggestive that other districts should consider implementing their own thoughtfully designed iPads for learning initiatives.