Earlier this month, Ben Goldacre’s government-backed report into the use of evidence in education was published. In this, and a subsequent post, I will respond to some of the key points and arguments.
Firstly, I will highlight the part that is to be welcomed. The suggestion is made that teaching could become an evidence based profession and that, as in medicine, evidence would aid informed decision-making. It is suggested that having expertise based on a grasp of the evidence would allow the profession to be more, rather than less, autonomous. It is suggested that teachers, by identifying the important questions from the frontline, could be the driving force in setting research priorities. There are practical suggestions, such as training teachers in research methods, finding ways to disseminate research findings and helping teachers to work with researchers.
This vision is a welcome change from the current dynamic between researchers and teachers. In my experience, the current situation is that teachers tend to encounter research in two unhelpful ways. Disastrously, dubious research is presented as the source of the latest initiatives and fads, as a reason to overrule professional judgement and embrace an idea suggested and endorsed by somebody who has either never taught full time or long since fled the classroom. In this model, research serves the interests and ideas of researchers and is used as a method of advocacy for, or an excuse for enforcement of, the latest fad. I can think of few bad ideas in teaching that weren’t, at some point, presented as the product of definitive research. Sometimes the research doesn’t really exist (e,g, Brain Gym). Sometimes the research itself is the worthless work of propagandists (e.g. Jo Boaler’s work on maths teaching). Sometimes the research is reputable but the interpretation is worthless (see countless forms of nonsense claiming to be based on Carol Dweck’s work). But the effect of this relationship has been to lower teacher autonomy and to make teachers instinctively sceptical of academics. Few teachers will change their ideas simply because of research, which is sometime an irony given that often our ideas as a profession simply reflect the work of some long since discredited researcher from a previous generation. A lot of the bad teaching methods enforced by OFSTED may have their origins in this kind of relationship between quack researchers and teachers.
The other relationship we see between teaching and research is that which broadly goes under the title of “action research”. Under this headline, some poor teacher who has foolishly decided to embark upon a masters degree in their spare time, is persuaded to carry out their own research project. Typically, this will be statistically worthless, involve lots of questionnaires and be considered worthwhile only if it shows interest in some current initiative or gimmick. While this may provide the teacher some insight into their own situation, it is unlikely to ever produce generalisable research results or anything more persuasive than the personal opinion of any other member of the teaching profession.
Overall, the “research architecture” suggested in the report is the most useful contribution to debate. The idea of a teaching profession setting the questions, and researchers investigating them, seems to be turning an upside down situation the right way round.
The less helpful discussion prompted by the report is that about Randomised Control Trials (RCTs) which are experiments conducted by applying different interventions to different people, selected at random and comparing the results. These have been particularly effective in medicine where they are used to evaluate new drugs and other interventions. In the debate over RCTs that has followed the report, I have tended to see from the anti-RCT side responses which either completely rule out evidence or RCTs, either arbitrarily, or for a reason related to a genuine difficulty but without any proper analysis of how great that difficulty is. From the pro-RCT side I have tended to see arguments which amount to little more than a confidence that problems that were overcome in medicine can be overcome here, that more trials can overcome the difficulties and the claim (without analysis) that the advantages will outweigh the costs. Goldacre acknowledges that qualitative research may be useful in explaining why certain interventions are effective (something I tend to doubt) but not that there are (quantitative) alternatives to RCTs that may prove more practical in many educational contexts.
I’m happy to accept most of the points in the report justifying RCTs as the best way to test an intervention. However, I feel that a lot of the debate in and around the paper on RCTs seems to ignore, or put off answering, some absolutely crucial questions about RCTs. These are mainly around which hypotheses are to be tested and what level of resources are to be devoted to testing them. I realise that it can be argued that these are debates for further down the road, that the first step is simply to accept the principle of RCTs, however, I think that if we fail to look at these questions first then we end up simply talking at cross-purposes for most of the discussion. What we test and how much we can spend testing it, shapes the both the usefulness and the ethics of RCTs.
In my next blogpost I will consider some of the questions that need to be considered in order to evaluate Goldacre’s case for increased use of RCTs.
Here I continue with my response to Ben Goldacre’s report on evidence in education.
The usefulness of RCTs (Randomised Control Trials) in education cannot be determined without confronting a large number of prior questions which have largely been avoided.
1) What type of debates are to be resolved by RCTs?
It would appear that despite the excellent job he did debunking Brain Gym, Goldacre has not realised that much, or even most, educational debate is about worthless nonsense that can already be shown to be wrong long before the RCT stage. He assumes that education is like medicine was in the 1970s, whereas it is probably more like medicine in the 1370s. He assumes that we have clear aims and sound theories which need to be refined with better empirical research to identify those situations where we have been misled. However, it would be fairer to say we are at war in education over our ultimate aims and over the underlying theories. We are not 1970s doctors needing information about the effectiveness of certain drugs, we are medieval doctors trying to find the correct balance of the four humours. Teachers (and educationalists) don’t agree on aims; they don’t agree on what existing evidence shows, and they don’t always have the time and resources to implement an idea even if it is proven to be effective (for instance, look at what most teachers agree is the most effective sort of marking and what marking most teachers can actually get done in a week).
What do we actually need to be finding out? Should we be testing ideas that are obviously wrong but popular (e.g. Brain Gym)? Should we be testing ideas that are not particularly plausible to teachers but are popular with educationalists or OFSTED inspectors (e.g. group work)? Should we be testing ideas that are supported by cognitive psychology but rarely applied, (e.g. use of tests to aid recall) or ideas that are common but psychologically implausible (e.g. discovery learning)? What constitutes success for an intervention? I wrote here about the kinds of goalpost-shifting arguments we have over teaching methods and RCTs do not provide an obvious end to that debate. Whether an RCT will give useful data or waste resources will depend on resolving these issues, issues which, as a profession, we seem to be getting nowhere with. Additionally, there are ethical issues. Goldacre suggests that these are not insurmountable but “requires everyone involved in education to recognise when it’s time to honestly say “we don’t know what’s best here””. Easily said, but how many debates are there in education where everyone, or even enough people to conduct research, actually say that?
2) What are the gains to accuracy from RCTs, relative to the costs?
A lot of the arguments for RCTs assume that they are justified by being more accurate then the alternatives and that this will apply in all cases. However, this needs to be considered alongside the difficulties which make RCTs in education practically difficult. Most interventions will happen at a whole class level, making it harder to isolate the effects. We will be unable to “blind” trials (i.e. ensure that those delivering an intervention don’t know whether they are doing so or instead just delivering a placebo). We don’t have the resources of the drug companies to fund trials. I think Goldacre is right to suggest that those who say RCTs in education cannot possibly work for reasons such as these have it wrong. But these reasons do mean that before we can begin we have to accept that the level of accuracy of any given RCT might not justify the cost. We cannot simply say that as RCTs are more accurate than non-randomised trials then we can safely ignore non-randomised research even if it is abundant and overwhelmingly pointing in one direction. We are not comparing the perfectly accurate with the utterly inaccurate; we are comparing differing degrees of accuracy. In the past I have heard Ben Goldacre declare that the effectiveness of phonics is, as yet, undecided because there have not been enough results from RCTs even though there have been hundreds of non-randomised studies. This argument might work if RCTs were perfect, and non-randomised studies were always useless. It is less convincing when we admit we are actually considering imperfect data, even when we look at RCTs. How much data can we afford to throw out in the hope that RCTs will provide a definitive answer at some point in the future? We can even take this point to the extreme of asking how much more accurate RCTs are compared with the opinions teachers form from experience. It is often assumed that, just as doctors were fooled by the placebo affect or the spread of probabilities, teachers are constantly mistaken about the effectiveness of their teaching. This has not actually been established, for all teachers making judgements in all ways. Most teachers would sooner seek advice from experienced peers than reserve judgement on all matters, waiting for RCTs that may never happen.
3) What research do we have the resources for?
Underlying the previous two questions is the issue of resources. If we can conduct enough RCTs it makes it easier to choose which questions to address. If we can make the RCTs large enough, or replicate them, then we can address questions of accuracy. There is money available for RCTs, but there will always need to be rationing because debates in education seem to expand continually, and even with unlimited funding there may still be a lack of trained researchers. We need to set priorities, which is what makes it so hard to proceed without resolving the previous two questions. We also need to have some idea of the potential benefits of RCTs. Too much of this discussion does seem to present RCTs as a magic bullet, where the benefits will definitely outweigh the costs, even if we use them indiscriminately. However, we constantly have to ask which applications of RCTs will provide the greatest benefit. Again, that requires asking the previous two questions. There is no point using RCTs to resolve questions where the answer is already available, or where teachers can find a reliable answer just from experience. There is little point in investing in RCTs in areas where the results from non-randomised trials are already convincing (e.g. phonics). And what if it turns out we can get reliable results from much cheaper methods, such as training teachers to better evaluate their own practice or from testing psychological theories in the lab?
4) Who are we trying to persuade here?
There’s no getting away from the fact that teachers have learned to be sceptical of education research. Often this has been for very good reason (as I explained in my previous blogpost). Sometimes it is purely out of stubbornness. Anyone following the debate over phonics will have noticed that there is a hardcore of educationalists and teachers who simply cannot be persuaded of the benefits of phonics by reason, evidence or even direct personal experience of its effectiveness. As well as those who are irrational about methods, I mentioned under the first question that there are also a variety of beliefs about aims. It’s all very well declaring that RCTs are the best evidence, but how often are disputes in education about the quality of evidence, and how often are they about ideology? There is no point in spending money on an RCT which will, whatever it shows, be ignored by all the people who most need to be persuaded. Does anyone think that even one phonics denialist will be persuaded if, on top of the hundreds of non-randomised studies showing the effectiveness of phonics, there were a few more RCTs? Perhaps this is what Goldacre means to address by talking about “culture change” in education, however, it does leave me wondering if a focus on RCTs is like hoping a particularly accurate globe will persuade flat-earthers. The time and resources spent on conducting RCTs might be better spent on persuading teachers to accept the scientific method, or the basics of cognitive psychology, rather than looking for an unidentified degree of improvement in the quality of empirical studies.
Having followed some of the debate that has happened since the report came out I may be over-emphasising the issue of RCTs here. I do accept that, where practical, they may be the best form of evidence. There are certain questions, such as those about expensive interventions affecting individual students rather than whole classes, where they are both practical and suited to the task. However, there are enough practical difficulties with making good use of RCTs more widely that I do worry that pushing for them may distract from the more important debate. The real argument in education is often over the principle of using of evidence rather than the question of which type of evidence. Pushing for the best possible evidence in all circumstances may turn out to make things worse for teachers faced with pseudo-science. Yes, if we set the bar for quality of evidence too low then we will be told that all sorts of nonsense is backed by research, but if we set it too high we will be told there is no evidence supporting interventions that have been shown to work time and time again. Either position will surrender territory to those who are convinced that their ideology must be enforced on the profession, and that research is to be used only to suit their pre-determined agenda rather than to ascertain the truth.
The post above was published in two parts.
Courtesy of Andrew Old of Scenes from the battleground