Sunday, 10 September 2017

Bishopblog catalogue (updated 10 Sept 2017)


Those of you who follow this blog may have noticed a lack of thematic coherence. I write about whatever is exercising my mind at the time, which can range from technical aspects of statistics to the design of bathroom taps. I decided it might be helpful to introduce a bit of order into this chaotic melange, so here is a catalogue of posts by topic.

Language impairment, dyslexia and related disorders
The common childhood disorders that have been left out in the cold (1 Dec 2010) What's in a name? (18 Dec 2010) Neuroprognosis in dyslexia (22 Dec 2010) Where commercial and clinical interests collide: Auditory processing disorder (6 Mar 2011) Auditory processing disorder (30 Mar 2011) Special educational needs: will they be met by the Green paper proposals? (9 Apr 2011) Is poor parenting really to blame for children's school problems? (3 Jun 2011) Early intervention: what's not to like? (1 Sep 2011) Lies, damned lies and spin (15 Oct 2011) A message to the world (31 Oct 2011) Vitamins, genes and language (13 Nov 2011) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Phonics screening: sense and sensibility (3 Apr 2012) What Chomsky doesn't get about child language (3 Sept 2012) Data from the phonics screen (1 Oct 2012) Auditory processing disorder: schisms and skirmishes (27 Oct 2012) High-impact journals (Action video games and dyslexia: critique) (10 Mar 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) Raising awareness of language learning impairments (26 Sep 2013) Good and bad news on the phonics screen (5 Oct 2013) What is educational neuroscience? (25 Jan 2014) Parent talk and child language (17 Feb 2014) My thoughts on the dyslexia debate (20 Mar 2014) Labels for unexplained language difficulties in children (23 Aug 2014) International reading comparisons: Is England really do so poorly? (14 Sep 2014) Our early assessments of schoolchildren are misleading and damaging (4 May 2015) Opportunity cost: a new red flag for evaluating interventions (30 Aug 2015) The STEP Physical Literacy programme: have we been here before? (2 Jul 2017)

Autism diagnosis in cultural context (16 May 2011) Are our ‘gold standard’ autism diagnostic instruments fit for purpose? (30 May 2011) How common is autism? (7 Jun 2011) Autism and hypersystematising parents (21 Jun 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) The ‘autism epidemic’ and diagnostic substitution (4 Jun 2012) How wishful thinking is damaging Peta's cause (9 June 2014)

Developmental disorders/paediatrics
The hidden cost of neglected tropical diseases (25 Nov 2010) The National Children's Study: a view from across the pond (25 Jun 2011) The kids are all right in daycare (14 Sep 2011) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Changing the landscape of psychiatric research (11 May 2014)

Where does the myth of a gene for things like intelligence come from? (9 Sep 2010) Genes for optimism, dyslexia and obesity and other mythical beasts (10 Sep 2010) The X and Y of sex differences (11 May 2011) Review of How Genes Influence Behaviour (5 Jun 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Genes, brains and lateralisation (22 Dec 2012) Genetic variation and neuroimaging (11 Jan 2013) Have we become slower and dumber? (15 May 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) Incomprehensibility of much neurogenetics research ( 1 Oct 2016) A common misunderstanding of natural selection (8 Jan 2017) Sample selection in genetic studies: impact of restricted range (23 Apr 2017)

Neuroprognosis in dyslexia (22 Dec 2010) Brain scans show that… (11 Jun 2011)  Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Neuronal migration in language learning impairments (2 May 2012) Sharing of MRI datasets (6 May 2012) Genetic variation and neuroimaging (1 Jan 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) What is educational neuroscience? ( 25 Jan 2014) Changing the landscape of psychiatric research (11 May 2014) Incomprehensibility of much neurogenetics research ( 1 Oct 2016)

Accentuate the negative (26 Oct 2011) Novelty, interest and replicability (19 Jan 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013) Who's afraid of open data? (15 Nov 2015) Blogging as post-publication peer review (21 Mar 2013) Research fraud: More scrutiny by administrators is not the answer (17 Jun 2013) Pressures against cumulative research (9 Jan 2014) Why does so much research go unpublished? (12 Jan 2014) Replication and reputation: Whose career matters? (29 Aug 2014) Open code: note just data and publications (6 Dec 2015) Why researchers need to understand poker ( 26 Jan 2016) Reproducibility crisis in psychology ( 5 Mar 2016) Further benefit of registered reports ( 22 Mar 2016) Would paying by results improve reproducibility? ( 7 May 2016) Serendipitous findings in psychology ( 29 May 2016) Thoughts on the Statcheck project ( 3 Sep 2016) When is a replication not a replication? (16 Dec 2016) Reproducible practices are the future for early career researchers (1 May 2017) Which neuroimaging measures are useful for individual differences research? (28 May 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017)  

Book review: biography of Richard Doll (5 Jun 2010) Book review: the Invisible Gorilla (30 Jun 2010) The difference between p < .05 and a screening test (23 Jul 2010) Three ways to improve cognitive test scores without intervention (14 Aug 2010) A short nerdy post about the use of percentiles (13 Apr 2011) The joys of inventing data (5 Oct 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Causal models of developmental disorders: the perils of correlational data (24 Jun 2012) Data from the phonics screen (1 Oct 2012)Moderate drinking in pregnancy: toxic or benign? (1 Nov 2012) Flaky chocolate and the New England Journal of Medicine (13 Nov 2012) Interpreting unexpected significant results (7 June 2013) Data analysis: Ten tips I wish I'd known earlier (18 Apr 2014) Data sharing: exciting but scary (26 May 2014) Percentages, quasi-statistics and bad arguments (21 July 2014) Why I still use Excel ( 1 Sep 2016) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017)

Journalism/science communication
Orwellian prize for scientific misrepresentation (1 Jun 2010) Journalists and the 'scientific breakthrough' (13 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Orwellian prize for journalistic misrepresentation: an update (29 Jan 2011) Academic publishing: why isn't psychology like physics? (26 Feb 2011) Scientific communication: the Comment option (25 May 2011)  Publishers, psychological tests and greed (30 Dec 2011) Time for academics to withdraw free labour (7 Jan 2012) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Communicating science in the age of the internet (13 Jul 2012) How to bury your academic writing (26 Aug 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013)  A short rant about numbered journal references (5 Apr 2013) Schizophrenia and child abuse in the media (26 May 2013) Why we need pre-registration (6 Jul 2013) On the need for responsible reporting of research (10 Oct 2013) A New Year's letter to academic publishers (4 Jan 2014) Journals without editors: What is going on? (1 Feb 2015) Editors behaving badly? (24 Feb 2015) Will Elsevier say sorry? (21 Mar 2015) How long does a scientific paper need to be? (20 Apr 2015) Will traditional science journals disappear? (17 May 2015) My collapse of confidence in Frontiers journals (7 Jun 2015) Publishing replication failures (11 Jul 2015) Psychology research: hopeless case or pioneering field? (28 Aug 2015) Desperate marketing from J. Neuroscience ( 18 Feb 2016) Editorial integrity: publishers on the front line ( 11 Jun 2016) When scientific communication is a one-way street (13 Dec 2016) Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing (25 Jul 2017)

Social Media
A gentle introduction to Twitter for the apprehensive academic (14 Jun 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) Will I still be tweeting in 2013? (2 Jan 2012) Blogging in the service of science (10 Mar 2012) Blogging as post-publication peer review (21 Mar 2013) The impact of blogging on reputation ( 27 Dec 2013) WeSpeechies: A meeting point on Twitter (12 Apr 2014) Email overload ( 12 Apr 2016)

Academic life
An exciting day in the life of a scientist (24 Jun 2010) How our current reward structures have distorted and damaged science (6 Aug 2010) The challenge for science: speech by Colin Blakemore (14 Oct 2010) When ethics regulations have unethical consequences (14 Dec 2010) A day working from home (23 Dec 2010) Should we ration research grant applications? (8 Jan 2011) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Should we ever fight lies with lies? (19 Jun 2011) How to survive in psychological research (13 Jul 2011) So you want to be a research assistant? (25 Aug 2011) NHS research ethics procedures: a modern-day Circumlocution Office (18 Dec 2011) The REF: a monster that sucks time and money from academic institutions (20 Mar 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) Journal impact factors and REF2014 (19 Jan 2013)  An alternative to REF2014 (26 Jan 2013) Postgraduate education: time for a rethink (9 Feb 2013)  Ten things that can sink a grant proposal (19 Mar 2013)Blogging as post-publication peer review (21 Mar 2013) The academic backlog (9 May 2013)  Discussion meeting vs conference: in praise of slower science (21 Jun 2013) Why we need pre-registration (6 Jul 2013) Evaluate, evaluate, evaluate (12 Sep 2013) High time to revise the PhD thesis format (9 Oct 2013) The Matthew effect and REF2014 (15 Oct 2013) The University as big business: the case of King's College London (18 June 2014) Should vice-chancellors earn more than the prime minister? (12 July 2014)  Some thoughts on use of metrics in university research assessment (12 Oct 2014) Tuition fees must be high on the agenda before the next election (22 Oct 2014) Blaming universities for our nation's woes (24 Oct 2014) Staff satisfaction is as important as student satisfaction (13 Nov 2014) Metricophobia among academics (28 Nov 2014) Why evaluating scientists by grant income is stupid (8 Dec 2014) Dividing up the pie in relation to REF2014 (18 Dec 2014)  Shaky foundations of the TEF (7 Dec 2015) A lamentable performance by Jo Johnson (12 Dec 2015) More misrepresentation in the Green Paper (17 Dec 2015) The Green Paper’s level playing field risks becoming a morass (24 Dec 2015) NSS and teaching excellence: wrong measure, wrongly analysed (4 Jan 2016) Lack of clarity of purpose in REF and TEF ( 2 Mar 2016) Who wants the TEF? ( 24 May 2016) Cost benefit analysis of the TEF ( 17 Jul 2016)  Alternative providers and alternative medicine ( 6 Aug 2016) We know what's best for you: politicians vs. experts (17 Feb 2017) Advice for early career researchers re job applications: Work 'in preparation' (5 Mar 2017)  

Celebrity scientists/quackery
Three ways to improve cognitive test scores without intervention (14 Aug 2010) What does it take to become a Fellow of the RSM? (24 Jul 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) How to become a celebrity scientific expert (12 Sep 2011) The kids are all right in daycare (14 Sep 2011)  The weird world of US ethics regulation (25 Nov 2011) Pioneering treatment or quackery? How to decide (4 Dec 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Why most scientists don't take Susan Greenfield seriously (26 Sept 2014)

Academic mobbing in cyberspace (30 May 2010) What works for women: some useful links (12 Jan 2011) The burqua ban: what's a liberal response (21 Apr 2011) C'mon sisters! Speak out! (28 Mar 2012) Psychology: where are all the men? (5 Nov 2012) Should Rennard be reinstated? (1 June 2014) How the media spun the Tim Hunt story (24 Jun 2015)

Politics and Religion
Lies, damned lies and spin (15 Oct 2011) A letter to Nick Clegg from an ex liberal democrat (11 Mar 2012) BBC's 'extensive coverage' of the NHS bill (9 Apr 2012) Schoolgirls' health put at risk by Catholic view on vaccination (30 Jun 2012) A letter to Boris Johnson (30 Nov 2013) How the government spins a crisis (floods) (1 Jan 2014) The alt-right guide to fielding conference questions (18 Feb 2017) We know what's best for you: politicians vs. experts (17 Feb 2017) Barely a good word for Donald Trump in Houses of Parliament (23 Feb 2017)

Humour and miscellaneous Orwellian prize for scientific misrepresentation (1 Jun 2010) An exciting day in the life of a scientist (24 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Parasites, pangolins and peer review (26 Nov 2010) A day working from home (23 Dec 2010) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Scientific communication: the Comment option (25 May 2011) How to survive in psychological research (13 Jul 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) The bewildering bathroom challenge (19 Jul 2012) Are Starbucks hiding their profits on the planet Vulcan? (15 Nov 2012) Forget the Tower of Hanoi (11 Apr 2013) How do you communicate with a communications company? ( 30 Mar 2014) Noah: A film review from 32,000 ft (28 July 2014) The rationalist spa (11 Sep 2015) Talking about tax: weasel words ( 19 Apr 2016) Controversial statues: remove or revise? (22 Dec 2016) The alt-right guide to fielding conference questions (18 Feb 2017) My most popular posts of 2016 (2 Jan 2017)

Tuesday, 25 July 2017

Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing

Guest blogpost by 

Ryan McKay, Department of Psychology,  Royal Holloway University of London


Max Coltheart, Department of Cognitive Science, Macquarie University

These days it is common for academics to receive invitations from unfamiliar sources to attend conferences, submit papers, or join editorial boards. We began an attack against this practice by not ignoring such invitations – by, instead, replying to them with messages selected from the output of the wonderful Random Surrealism Generator. It generates syntactically correct but surreal sentences such as “Is that a tarantula in your bicycle clip, or are you just gold-trimmed?” (a hint of Mae West there?). This sometimes had the desired effect of generating a bemused response from the inviter; but we decided more was needed.

So we used the surrealism generator to craft an absurdist critique of “impaired” publication practices (the title of the piece says as much, albeit obliquely). The first few sentences seem relevant to the paper’s title but the piece then deteriorates rapidly into a sequence of surreal sentences (we threw in some gratuitous French and Latin for good measure) so that no one who read the paper could possibly believe that it was serious (our piece also quotes itself liberally); and we submitted the paper to a number of journals. Specifically, we submitted the paper to every journal that contacted either of us in the period 21 June 2017 to 1 July 2017 inviting us to submit a paper. There were 10 such invitations. We accepted all of them, and submitted the paper, making minor changes to the title of the paper and the first couple of sentences to generate the impression that the paper was somehow relevant to the interests of the journal; but the bulk of the paper was always the same sequence of surreal sentences.

While we were engaged in this exercise, the blogger Neuroskeptic was doing something similar: we describe that work below. Both of us were of course following the honourable tradition of  submissions as these by -->Peter Vamplew and Christoph Bartnek (More generally, there is a fine tradition of hoax articles intended as critiques of certain academic fields, e.g., postmodernism or theology).

What happened then?

All ten journals responded by informing us that our ms had been sent out for review.  We did not hear anything further from four of them. A fifth, the SM Journal of Psychiatry and Mental Health, eventually responded “The ms was plagiarized so please make some changes to the content”. We did not respond to this request, nor to a subsequent request for resubmission. 

The Scientific Journal of Neurology & Neurosurgery responded by telling us that our paper had been peer-reviewed; the reviewer praised our “scientific methodology” but chided us about our poor English (specifically, they said “English should be rewritten, it is necessary a correction of typing errors (spaces)”). We ignored this advice and resubmitted. However, the journal then noticed the similarity with the article we had submitted to the International Journal of Brain Disorders and Therapy (see below for this), so ceased production of our article.

The paper was accepted by Psychiatry and Mental Disorders: “accepted for publication by our reviewers without any changes”, we were told.

The paper was accepted by Mental Health and Addiction Research, but at that point we were told that a publication fee was due. We protested on the ground that when we had been invited to submit there had been no mention of a fee, and we said that unless a full fee waiver was granted we would take our work to a more appreciative journal. In response, we were granted a full fee waiver, and our paper was published in the on-line journal.

The SM Journal of Disease Markers also accepted the paper, and sent us proofs, which we corrected and returned. At that point, we were told that an article processing fee of US$920 was due. We protested in the same way, asking for a full fee waiver. In response, they offered a reduced fee of $520. We did not respond, so this paper, although accepted, has not been published.

The tenth journal, the International Journal of Brain Disorders and Therapy, sent us one reviewer comment. The reviewer had entered into the spirit of the hoax by providing a review which was itself surrealistic. We incorporated this reviewer’s comment about Scottish Lithium Flying saucers and resubmitted, and the paper was accepted. The journal then noticed irregularities in some (but surprisingly not all) of the references. We replaced these problematic references with citations of recent and classic hoaxes (e.g., Kline & Saunders’ 1959 piece on “psychochemical symbolism”; Lindsay & Boyle’s recent piece on the “Conceptual Penis”), along with a citation of Pennycook et al’s article “On the reception and detection of pseudo-profound bullshit”. The paper was then published in the on-line journal.  Later this journal asked us for a testimonial about the review process, which we supplied: "The process of publishing this article was much smoother than we anticipated".

In sum: all ten journals to which we submitted the paper sent it out for review, even though any editor had only to read to the end of the first paragraph to come across this:
“Of course, neither cognitive neuropsychiatry nor cognitive neuropsychology is remotely informative when it comes to breaking the ice with buxom grapefruits. When pondering three-in-a-bed romps with broken mules, therefore, one must refrain, at all costs, from driving a manic-depressive lemon-squeezer through ham (Baumard & Brugger, 2016).”

Of these ten journals, two tentatively accepted the paper and four fully accepted it for publication. Two of these journals have already published it.

The blogger Neuroskeptic did this a little differently (see ). A hoax paper entitled “Mitochondria: Structure, Function and Clinical Relevance” was prepared. It did not contain any nonsensical sentences, as our paper did, but its topic was the fictional cellular entities “midi-chlorians” (which feature in Star Wars). The paper was submitted to nine journals. Four accepted it. One of these charged a fee, which the author declined to pay; the other three charged no fee, and so the paper has been published in all three of these papers, the International Journal of Molecular Biology: Open Access (MedCrave), the Austin Journal of Pharmacology and Therapeutics (Austin) and American Research Journal of Biosciences (ARJ). In order to know that this paper was nonsense, one would need some knowledge of cell biology. But our paper is blatantly nonsensical to any reader; and yet it boasted an acceptance rate very similar to that of Neuroskeptic’s paper.

What can be learned from our exercise? Several things:

(a) It is clear that with these journals there is no process by which a submission is initially read by an editor to decide whether the paper should be sent out for review, because our paper could not possibly have survived any such inspection.

(b)  But nor should our paper have survived any serious review process, since any reviewer reading the paper would have pointed out its nonsensical content. Only twice did a journal send us feedback from a reviewer, one which said we should discuss Lithium Flying Saucers, and one which seemed suspect to us because its criticism of our English was expressed in such poor English.

(c) In contrast to this apparent lack of human intervention in the article-handling process, there was some software intervention: some of these journals appear routinely to apply plagiarism-detection software to submitted articles

(d) What’s in this for the journals? We assumed that they exist solely to make money by charging authors. We presume that, just as they attempt to build apparently legitimate editorial boards (see here), these journals will sometimes waive their fees so as to get some legitimate-seeming articles on their books, the better to entice others to submit.

Sunday, 2 July 2017

The STEP Physical Literacy programme: have we been here before?

One day in 2003, I turned on BBC Radio 4 and found myself listening to an interview on the Today Programme with Wynford Dore, the founder of an educational programme that claimed to produce dramatic improvements in children's reading and attentional skills. The impetus for the programme was a press release of a study published in the journal Dyslexia, reporting results from a trial of the programme with primary school-children.  The interview seemed more like an advertisement than a serious analysis, but the consequent publicity led many parents to sign up for the programme, both in the UK and in other countries, notably Australia.

The programme involved children doing two 10-minute sessions per day of exercises designed to improve balance and eye-hand co-ordination. These were personalised to the child, so that the specific exercises would be determined by level of progress in particular skills. The logic behind the approach was that these exercises trained the cerebellum, a part of the brain concerned with automatizing skills. For instance, when you first play the piano or drive a car, it is slow and effortful, but after practice you can do it automatically without thinking about it. The idea was that cerebellar training would lead to a general cerebellar boost, helping other tasks, such as reading, to become more automatic.

Various experts who were on the editorial board of Dyslexia were unhappy with the quality of the research and asked for the paper to be retracted. When no action was taken, a number of them resigned. In 2007, I published a detailed critique of the study, which by that time had been complemented by a follow-up – which had prompted further editorial resignations.
Meanwhile, Wynford Dore, who had considerable business acumen, continued to promote the Dore Programme, writing a popular book describing its origins, and signing up celebrities to endorse it. Among these were rugby legends Kenny Logan and Scott Quinnell. In addition, Dore was in conversations with the Welsh Assembly about the possibility of rolling the programme out in Welsh schools. He had also persuaded Conservative MP Christopher Chope that the Dore programme was enormously effective but was being suppressed by government.
Various bloggers were interested in the amazing uptake of the Dore Programme, and in 2008, Ben Goldacre wrote a trenchant piece on his Bad Science blog, noting among other things that Kenny Logan was paid for some of his promotional work. The nail in the coffin of the Dore Programme was an Australian documentary in the Four Corners series, which included interviews with Dore, some of his customers, and scientists who had been involved both in the evaluation and the criticisms. The Dore business, which had been run as a franchise, collapsed, leaving many people out of pocket: parents who had paid up-front for a long-term intervention course, and staff at Dore centres, who found themselves out of a job.
The Dore programme did not die completely, however. Scott Quinnell continued to market a scaled-down version of the programme through his company Dynevor, but was taken to task by the Advertising Standards Authority for making unsubstantiated claims. Things then went rather quiet for a while.
This year, however, I have been contacted by concerned teachers who have told me about a new programme, STEP Physical Literacy, which is being promoted for use in schools, and which bears some striking similarities to Dore.  Here are some quotes from the STEP website:
  • Pupils undertake 2 ten minute exercise sessions at the start and end of each school day. The exercises focus on the core skills of balance, eye-tracking and coordination.
  • STEP is a series of personalised physical exercises that stimulate the cerebellum to function more efficiently.
  • The STEP focus is on the development of physical capabilities that should be automatic such as standing still, riding a bike or following words on a page.
In addition, STEP Physical Literacy is being heavily promoted by Kenny Logan, who features several times on the News section of the website.
As with Dore, STEP has been promoted to politicians, who argue it should be introduced into schools. In this case, the Christopher Chope role is fulfilled by Liz Smith MSP, who appears to be sincerely convinced that Scotland's literacy problems can be overcome by having children take two 10 minute sessions out of lessons to do physical exercises.
On Twitter, Ben Goldacre noted that the directors of Dynevor CIC, overlap substantially with directors of Step2Progress, who own STEP. The registered address is the same for the two companies.
When asked about Dore, those involved with STEP deny any links. After I tweeted about this, I was emailed by Lucinda Roberts Holmes, Managing Director of STEP, to reassure me that STEP is not a rebranding of Dore, and to suggest we meet so she could "talk through the various pilots and studies that have gone on both in the UK and the US as well as future research RCTs planned with Florida State University and the University of Edinburgh." I love evidence, but I find it best to sit down with data rather than have a conversation, so I replied explaining that and saying I'd be glad to take a look at any written reports. So far nothing has materialised. I should add that I have not been able to find any studies on STEP published in the peer-reviewed literature, and the account of the pilot study and case studies on the STEP website does not given me confidence that these would be publishable in a reputable journal.
In short, the evidence to date does not justify introducing this intervention into schools: there's no methodologically adequate study showing effectiveness, and it carries both financial costs and opportunity costs to children. It's a shame that the field of education is so far behind medicine in its attitude to evidence, and that we have politicians who will consider promoting educational interventions on the basis of persuasive marketing. I suggest Liz Smith talks to the Education Endowment Foundation, who will be able to put her in touch with experts who can offer an objective evaluation of STEP Physical Literacy.

8th July 2017: Postscript. A response from STEP
I have had a request from Lucinda Roberts-Holmes, Managing Director of Step2Progress, to remove this blogpost on the grounds that it contains defamatory and inaccurate information. I asked for more information on specific aspects of the post that were problematic and obtained a very long response, which I reproduce in full below. Readers are invited to form their own interpretation of the facts, based on the response from STEP (in italics) and my comments on the points raised.

Preamble: To be clear your blog in its current form includes a number of statements which are factually incorrect. In particular, the suggestion that STEP is simply a reincarnation of the Dore programme is not true as I have already explained to you (see my email of 29 June). The fact that you chose to ignore that assurance and instead publish the blog is very concerning to us. The suggestion, also, that I had chosen not to reply to your email ("so far nothing has materialised") is, I am afraid, disingenuous particularly in circumstances where you did not even set a deadline in your email and you waited only 72 hours to post your blog. Had you, of course, waited to receive a response to your email, we would have explained the correct position to you. Similarly, had you carried out an objective comparison of the two programmes you would have noted the many differences between STEP and Dore and, more significantly, identified the fact that STEP makes absolutely none of the assertions about cures for Dyslexia and other learning difficulties or any other of the hypotheses that Wynford Dore concocted. They are not the same programme evidenced not least by the fact that STEP states its programme is not a SEN learning intervention.

Comment: a) I did not state in the blog that STEP is 'simply a reincarnation of the Dore Programme'. I said it bears some striking similarities to Dore.

b) I did not ignore Lucinda's reassurance that STEP is not a rebranding of Dore. On the contrary, I stated in the blogpost that I had received that reassurance from her.

c) I did not suggest that Lucinda had chosen not to reply to my email: I simply observed that I had not so far received a response. As my blogpost points out, I had made it clear in my initial email that I did not want her to 'explain the correct position' to me. I had specifically requested written reports documenting the evidence for effectiveness of STEP.

1. Despite what Ben Goldacre may believe, Kenny Logan (KL) was not paid by the Dore programme for "promotional work". He was, in fact, a paying customer of the programme who went from being unable to read at the start of the programme to being literate by the end of it. KL was happy to share his experience publicly and was very clear with Dore that he would not be paid to do this. Whilst it is true that in 2006, he was contracted and paid by Wynford Dore for his professional input into a sports programme that he was seeking to develop that is an entirely different matter. The suggestion that KL was only promoting the Dore programme for his own financial benefit is clearly defamatory of him (and indeed of us).

I asked Ben Goldacre about this. The claim about Logan's payment for promotional work was made in a Comment is Free article in the Guardian. Ben told me it all went through a legal review at the Guardian to ensure everything was robust, and no complaints were received from Kenny Logan at the time. If the claim is untrue, then Kenny Logan needs to take this up with the Guardian. It's unclear to me why Kenny Logan promoting Dore would be defamatory of STEP, given that STEP claims to have no association with Dore.

2. The fact that KL previously promoted the Dore programme also does not support the allegation that the STEP programme is the same as the Dore programme. They are very different programmes and we are a very different organisation to Dore. Incorrectly stating that KL was paid for the promotion of Dore and trying to draw an inference that therefore he is paid to promote STEP (which he is not) is also misleading.

Comment: I made no claims that Kenny Logan is paid to promote STEP. He is a shareholder in STEP2Progress, which is a different matter.

3. Dynevor was never "Scott Quinnell's Company". Dynevor was primarily owned by Tim Griffiths and was the organisation that purchased the intellectual property rights in Dore after it went bankrupt. Tim Griffiths had no prior connection to Wynford Dore or the Dore programme but did have an interest in the link between exercise and ability to learn. As many thousands of people had been left in a difficult position when Dore collapsed into administration having purchased a programme they could not continue the directors at Dynevor agreed to commit the funding necessary to allow those who wanted to continue the programme the opportunity to do so. Scott Quinnell had a shareholding of less than 1% in Dynevor. STEP has absolutely no association with Scott Quinnell.

Comment: The role of Scott Quinnell in Dynevor is not central to my description of Dore, but this account of his role seems disingenuous. According to Companies House, Quinnell was appointed as one of two Directors of Dynevor C.I.C in 2009, and his interest in the company in 2011 was 2.6% of the shareholding, at a time when Wynford Dore had a shareholding of 4.3%.

I have not claimed that Scott Quinnell has any relationship with STEP. My account of his dealings was to provide a brief history of the problems with Dore for readers unfamiliar with the background.

4. You refer to the claims Ben Goldacre has made on Twitter that the directors of Dynevor CIC "overlap substantially" with the directors of STEP. In fact, of the 8 Directors of Dynevor only 2 hold directorships at STEP. In any event that misses the point which is that none of the directors of STEP had any association with the Dore Programme prior to the purchase of intellectual property rights in 2009.

Comment: According to Companies House, the one 'active person with significant control' in Dynevor CIC is Timothy Griffiths, and the 'one active person with significant control' in STEP2Progress is Conor Davey. If I have understood this correctly, this is based on shareholdings. Timothy Griffiths is one of four Directors of STEP2Progress, and Conor Davey is the Chairman of Dynevor CIC. Dynevor CIC and STEP2Progress have the same postal address.

It wasn't quite clear if Lucinda was saying that Dynevor CIC is now disassociated from Dore, but if that is the case, it would be wise to update the company's LinkedIn Profile, which states that the company 'provides the Dore Programme to individual clients and schools around the UK and licences the rights to provide the Dore Programme in a number of overseas countries'.

5. It is not correct to state that STEP denies any links to the Dore programme. There is, of course, a link, as there is also to the work of Dr Frank Belgau and his studies into balametrics. There is also a link to other movement programmes such as Better Movers and Thinkers and Move to Learn. What we have said is that the STEP programme is not the Dore programme and we stand by this. You may seek to draw similarities between them as I could between apples and pears.

Comment: Nowhere in my blogpost did I state that STEP denies any links to the Dore programme.

Re Belgau: I have just done a search on Web of Science that returned no articles for either author = Belgau or topic = balametrics.

6. May I also ask how you can state that "the evidence to date does not justify introducing this intervention in to schools" when you have refused so far to meet with me or even seen the evidence or read the full Pilot Study? Have you asked any teachers or head teachers who have experience of delivering the STEP Programme whether they would recommend to their peers the use of the programme in their schools?

Comment: There is a fundamental misunderstanding here about how scientists evaluate evidence. If you want to find out whether an intervention is effective, the worst thing you can do is to talk to people who are convinced that it is. There are people who believe passionately in all sorts of things: the healing powers of crystals, the harms of vaccines, the benefits of homeopathy, or the evils of phonics instruction. They will, understandably, try to convince you that they are right, but they will not be objective. The way to get an accurate picture of what works is not by asking people what they think about it, but by doing well-controlled studies that compare the intervention with a control condition in terms of children's outcomes. It is for this reason that I have been asking for any hard evidence that STEP2Progress has from properly conducted studies or information about future-planned studies, which I am told are in the pipeline. I would love to read the full Pilot Study, but am having difficulty accessing it (see below).

7. You say in your blog "It is a shame that... We have politicians who will consider promoting educational interventions on the basis of persuasive marketing" Presumably this is a reference to Liz Smith MSP (LS) who you refer to separately in the blog? For your information, LS has read the full research report of the 2015/2016 Pilot Study as well as the other case studies. In light of that information, she has indicated the she is impressed with the STEP programme and that the Scottish Government should consider piloting it and looking more widely at the impact of physical literacy on academic attainment. At the point she expressed this view there had not been any marketing of the STEP programme in Scotland so I do not understand the evidence to support the statement you make in the blog.

Comment: In this regard Liz Smith has the advantage. Although Lucinda has now sent me three emails since my blogpost appeared, in none of them did she send me the reports I had initially requested. In my latest email I asked to see the 'full research report' that Liz Smith had access to. I got this reply from Lucinda:

Dear Dorothy,

Thank you for your email. With the greatest respect, I think the first step should be for you to correct or remove your blog and apologise for the inaccuracies I have outlined below. Alongside that I repeat my offer to come and talk you through the STEP programme and the studies that have been carried out so far. As I say, we are not the same programme as the Dore programme and it is wrong to allege otherwise.

Kind regards

Nevertheless, with her penultimate email, Lucinda attached a helpful Excel spreadsheet documenting differences between Dore and STEP, as follows:

Difference 1. The Dore Programme was a paper book of 100 exercises followed sequentially. Dore's assertions that they were personalised were untrue. STEP software contains over 350 exercises delivered through an adaptive learning software platform that is individualised to the child based on previous performance. The Programme also contains 10 minutes of 1-1 time with each pupil twice per day (nurture) and involves pupils overcoming a series of physical challenges (resilience) in a non class-competitive environment (success cycle) which displays their commitment levels (engagement) and is overseen by committed members of staff who also work with them in the classroom (mentoring and translational trust building).

Comment: The question of interest is where do these exercises come from? How were they developed? Usually for an adaptive learning process, one needs to do prior research to establish difficulty levels of items for children of different ages. I raised this issue with the original Dore programme: there is no published evidence of the kind of foundational work you'd normally expect for an educational programme. Readers will no doubt be intereted to hear that STEP has more exercises than Dore and delivers these in a specific, personalised sequence, but what is missing is a clear rationale explaining how and why specific exercises were developed. It would also be of interest to know how many of Dore's original 100 exercises are incorporated in STEP.

Difference 2. Dore was an exercise programme completed by adults and children at home supervised by untrained parents. STEP is delivered in schools and overseen by teaching staff trained through industry leader Professor Geraint Jones' teacher training programme. This also includes training on how to assess pupil performance.

Comment. If the intervention is effective, then standardized administration by teachers is a good thing. If it is not effective, then teachers should not be spending time and money being trained. Everything hinges on evidence for effectiveness (see below).

Difference 3. Dore asserted that the programme was a cure for dyslexia and and other learning difficulties. It further claimed to know the cause of these learning difficulties. STEP makes absolutely no assertions about Dyslexia, ADHD or other learning difficulties and absolutely no assertions about the medical cause for these.

Comment. I am sure that there are many people who will be glad to have the clarification that STEP is not designed to treat children with specific learning difficulties or dyslexia, as there appears to be some misunderstanding of this. This may in part be the consequence of Kenny Logan's involvement in promoting STEP. Consider, for instance, this piece in the Daily Mail, which first describes how Kenny's dyslexia was remediated by the Dore programme, and then moves to talk of his worries over his son Reuben, who was having difficulties in school:

"The answer was already staring him in the face, however, and within months, Kenny decided to try putting Reuben through a similar 'brain-training' technique to the one that transformed his own life just 14 years ago. Reuben, it transpired, had mild dyspraxia - a condition affecting mental and physical co-ordination - and the outcome for him has been so successful that Kenny is currently trying to persuade education chiefs to implement the technique in the country's worst-performing state schools, to raise attainment levels."

Another reason for confusion may be because the STEP home page lists the British Dyslexia Association as a partner and has features in the News section of its website on Dyslexia Awareness Month , on unidentified dyslexia, and a case study describing use of STEP with dyslexic children in Mississippi.

The transcript of the debate in the Scottish Parliament (scroll down to the section on Motion debated: That the Parliament is impressed by the STEP physical literacy programme) shows that many of the Scottish MPs who took part in the debate with Liz Smith were under the impression that STEP was a treatment for specific learning disabilities such as dyslexia and ADHD, as evident from these quotes:

Daniel Johnson: 'It is vital that we understand that there is a direct link between physical understanding, learning, knowledge and ability and educational ability. Overall - and specifically - there would be key benefits for people who have conditions such as ADHD and dyslexia... There is a growing body of evidence about the link between spatial awareness and physical ability and dyslexia. Likewise, the improvements on focus and concentration that exercises such as those that are outlined in the STEP programme can have for people with ADHD are clear. Improvements in those areas are linked not only to training the mind to concentrate, but to the impacts on brain chemistry.'

Elaine Smith: With regard to STEP, we have already heard that it is a programme of exercises performed twice a day for 10 minutes and focuses in particular on balance, eye tracking and co-ordination with the aim of making physical activity part of children's everyday learning. Improving physical literacy is particularly advantageous for children and young people who can find it difficult to concentrate, such as those with dyslexia and autism... STEP also has the backing of the British Dyslexia Association, which supported the findings of the pilot study.

Shirley-Anne Somerville: We are aware that the STEP programme has been promoted for children who have dyslexia.

Difference 4. Dore claimed that completing the exercises would repair a damaged or underdeveloped cerebellum. It is known that repetitive physical exercises stimulate the cerebellum but STEP makes no assertions of science that any physiological changes take place. STEP involves using repetitive physical exercises to embed actions and make them automatic.

Comment: It is good to see that some of the more florid claims of Dore are avoided by STEP, but the fact remains that the underlying theory is similar, namely that cerebellar training will improve skills beyond motor skills. The idea that training motor skills will produce effects that generalise to other aspects of development is is dubious because the cerebellum is a complex organ subserving a range of functions and controlled studies typically find that training effects are task-specific. I discussed these issues in relation to the Dore programme here.

Specific statements about the cerebellum on the STEP website are:

'After going on national television to tell his heart-breaking story about facing up to the frustrations of overcoming a childhood stumbling block bigger than Mount Everest, Kenny (Logan) is determined to highlight the positive effects of using cerebellum specific teaching and learning programmes in primary school settings.'

And on this page of the website we hear: 'In the last century, academics experimenting with balametrics, dance and movement, established that specifically stimulating the cerebellum through exercise improves skill automation. The STEP Programme is built upon this foundation.'

Difference 5. Dore was a "medical" treatment that required participants to regularly visit treatment centres for "medical" evaluations to determine whether their learning difficulty was being cured. STEP is a primary school physical literacy programme delivered by teaching assistants or other teaching staff. It is to date shown to be most impactful on the lower quartile of the classroom in terms of academic improvement.

This is a rather odd interpretation of the Dore programme, which perhaps is signalled by the use of quotes around 'medical'. I never had the impression it was medical ╨ it was not prescribed or administered by doctors. It is true that Dore did establish centres for assessment and this proved to be a major reason for its commercial failure: there were substantial costs in premises, staffing and equipment. But there was no necessity to run the intervention that way: some people at the time of the collapse suggested it would be feasible to offer the exercises over the internet at much lower cost.

The second point, re the greatest benefits for the lower quartile of the classroom, is on the one hand of potential interest, but on the other hand raises the concern that the benefits could be a classic case of the regression to the mean. This is one of many ways in which scores can improve on an outcome measure for spurious reasons - which is why you need proper randomised controlled trials. Improvements are largely uninterpretable without these because increases in scores can arise because of practice, maturation, regression to the mean or placebo effects.

Difference 6. Dore determined "progress" and "cure" via a series of physical assessments. STEP empirically measures the academic progress of pupils with baseline data and presents reports against actual physical skills developed inviting schools to draw their own conclusions in the context of their school setting.

Comment. Agree that Dore's method of measuring progress and cure was a major problem, because a child could improve on the measures of balance and eye-hand co-ordination and be deemed 'cured' even though their reading had not improved at all. But the account of STEP sounds too vague to evaluate - and the evidence on their website from the pilot study is so underspecified as to be uninterpretable. It is not clear what the measures were, and which children were involved in which measures. I would like to see the full report to have a clearer idea of the methods and results.

Difference 7. Dore claimed that the exercises were developed and delivered in a formulaic manner that was a trade secret. STEP focuses on determining whether a pupils core physical capabilities in balance, eye tracking and coordination. There is no secret formula or claims of one. The genesis of STEP is in balametrics as well as other movement programmes such as Better Movers and Thinkers and Move to Learn

Comment. In STEP, how are the scores on core physical capabilities standardized for age and sex? This refers back to my earlier comment about the development work needed to underpin an effective programme. The impression is that people in this field borrow ideas from previous programmes but there is no serious science behind this.

Difference 8. The Dore Programme cost over £2000 per person and was paid for individually. STEP costs £365 per year per child and is completed over 2 years. It is largely paid for through schools that have the discretion to ask parents to fund the programme if it is an additional intervention being offered. STEP also commits a significant number of places to schools free of charge. The fee includes year round school support

Comment. Good to have the differences in charging methods clarified.

Difference 9. Dore published research based around a single school with hypotheses relating to the cerebellum and dyslexia that could not be substantiated. It used dyslexic tendencies as a measure of improvement and selection. STEP as an organisation is wholly open to independent research and evaluation. Its initial pilot study was designed and led by the IAPS Education Committee and conducted by Innovation Bubble, led by Dr Simon Moore, University of Middlesex and Chartered Psychologist. It was held across 17 schools. Further pilot studies have taken place carried out by education districts in Mississippi and ESCCO as well as independent case studies. These have always been presented openly and in the context they were compiled. STEP believes it has sufficient evidence to warrant a large scale evaluation of the Programme.

Comment. In the context of intervention evaluation, quantity of research does not equate with quality. Here is Wikipedia's definition of a pilot study: 'A small scale preliminary study conducted in order to evaluate feasibility, time, cost, adverse events, and effect size (statistical variability) in an attempt to predict an appropriate sample size and improve upon the study design prior to performance of a full-scale research project.' I agree that a large-scale evaluation of the Programme is warranted. It's a bit odd to say the results have been presented openly while at the same time refusing to send me reports unless I take down my blogpost.

It is clear that the MSPs in the debate in the Scottish Parliament were all, without exception, convinced that we already had evidence for the effectiveness of STEP. If they based these impressions on the information on the STEP website (as suggested by Liz Smith's initial statement), then this is worrying, as this came from the pilot study, where the methods were not clearly described, and the description of the results is unclear and looks incomplete, or from uncontrolled case studies.

Here are some of the statements from MSPs:

Liz Smith: As members know, the programme has been used successfully in both England and the United States, and it has been empirically evidenced to reduce the attainment gap in primary school pupils. Pupils who have completed STEP have shown significant improvements academically, behaviourally, physically and socially. A United Kingdom pilot last year compared more than 100 below-attainment primary school pupils who were on the STEP programme to a group of pupils at the same attainment level who were not. The improved learning outcomes that the study showed are extremely impressive: 86 per cent of pupils on the programme moved to on or above target in reading, compared with 56 per cent of the non-STEP group; 70 per cent of STEP pupils met their target for maths, compared with 30 per cent of the non-STEP group; and 75 per cent and 62 per cent of STEP pupils were on or above target for English comprehension and spelling respectively, compared with 43 per cent and 30 per cent of the non-STEP group.
In Mississippi, in the USA, more than 1,000 pupils have completed the programme over the past three years, and it is no coincidence that that state has seen significant improvement in fourth grade - which is the equivalent of P6 - reading and maths, which has resulted in the state being awarded a commendation for educational innovation.

Brian Whittle: The STEP programme is tried and tested, with measured physical, emotional and academic outcomes, especially in the lower percentiles.

Daniel Johnson: Perhaps most impressive is the STEP programme's achievements on academic improvement╤it has led to improved English for 76 per cent of participants, and to improved maths, reading and spelling for 70 per cent of participants. The benefits that physical literacy can bring to academic attainment are clear.

Oliver Mundell: the STEP programme has been shown to work and is popular with both the teachers and the pupils who have benefited from it in England and the USA.

Conclusion This has been a very long postscript, but it seems important to be clear about what the objections to STEP are. I have not claimed that STEP is exactly the same as Dore. My sense of déjà vu arises because of the similarities, in the people involved, in the use of cerebellar exercises involving balance and eye-hand coordination delivered in short sessions, and in the successful promotion of the programme to politicians and schools in the absence of adequate peer-reviewed evidence. Given that the basic theory does not have strong scientific plausibility, this latter point that is the source of greatest concern. We can agree that we all want children to succeed in school and any method that can help them achieve this is to be welcomed. There is also, however, a need for better education of our politicians, so that they are equipped to evaluate evidence properly. They have a responsibility to ensure we do the best for our children, but this requires a critical mindset.

Saturday, 17 June 2017

Prospecting for kryptonite: the value of null results

This blogpost doesn't say anything new – it just uses a new analogy (at least new to me) to make a point about the value of null results from well-designed studies. I was thinking about this after reading this blogpost by Anne Scheel.

Think of science like prospecting for kryptonite in an enormous desert. There's a huge amount of territory out there, and very little kryptonite. Suppose also that the fate of the human race depends crucially on finding kryptonite deposits.

Most prospectors don't find kryptonite. Not finding kryptonite is disappointing: it feels like a lot of time and energy has been wasted, and the prospector leaves empty-handed. But the failure is nonetheless useful. It means that new prospectors won't waste their time looking for kryptonite in places where it doesn't exist.  If, however, someone finds kryptonite, everyone gets very excited and there is a stampede to rush to the spot where it was discovered.

Contemporary science works a bit like this, except that the whole process is messed up by reporting bias and poor methods which lead to false information.

To take reporting bias first: suppose the prospector who finds nothing doesn't bother to tell anyone. Then others may come back to the same spot and waste time also finding nothing. Of course, some scientists are like prospectors in that they are competitive and would like to prevent other people from getting useful information. Having a competitor bogged down in a blind alley may be just what they want for their rivals. But where there is an urgent need for new discovery, there needs to be a collaborative rather than competitive approach, to speed up discovery and avoid waste of scarce funds. In this context, null results are very useful.

False information can come from the prospector who declares there is no kryptonite on the basis of a superficial drive through a region. This is like the researcher who does an underpowered study that gets an inconclusive null result. It doesn't allow us to map out the region with kryptonite-rich and kryptonite-empty areas – it just leaves us having to go back and look again more thoroughly. Null results from poorly designed studies are not much use to anyone.

But the worst kind of false information is fool's kryptonite: someone declares they have found kryptonite, but they haven't. So everyone rushes off to that spot to try and find their own kryptonite, only to find they have been deceived. So there are a lot of wasted resources and broken hearts. For a prospector who has been misled in this way, this situation is worse than just not finding any kryptonite, because their hopes have been raised and they may have put a disproportionate amount of effort and energy into pursuing the false information.

Pre-registering a study is the equivalent of a prospectors declaring publicly that they are doing a comprehensive survey of a specific region, and will declare what they have found, so that the map can gradually be filled in, with no duplication of effort.

Some will say, what about exploratory research? Of course the prospector may hit lucky and find some other useful mineral that nobody had anticipated. If so, that's great, and it may even turn out more important than kryptonite. But the point I want to stress is that the norm for most prospectors is that they won't find kryptonite or anything else. Really exciting findings occur rarely, yet our current incentive structures create the impression that you have to find something amazing to be valued as a scientist.  It would make more sense to reward those who do a good job of prospecting, producing results that add to our knowledge and can be built upon.

I'll leave the last word to Ottoline Leyser, who in an interview for The Life Scientific said: "There's an awful lot of talk about ground-breaking research…. Ground-breaking is what you do when you start a building. You go into a field and you dig a hole in the ground. If you're only rewarded for ground-breaking research, there's going to be a lot of fields with a small hole in, and no buildings."

Sunday, 28 May 2017

Which neuroimaging measures are useful for individual differences research?

The tl;dr version

A neuroimaging measure is potentially useful for individual differences research if variation between people is substantially greater than variation within the same person tested on different occasions. This means that we need to know about the reliability of our measures, before launching into studies of individual differences.
High reliability is not sufficient to ensure a good measure, but it is necessary.

Individual differences research

Psychologists have used behavioural measures to study individual differences - in cognition and personality - for many years. The goal is complementary to psychological research that looks for universal principles that guide human behaviour: e.g. factors affecting learning or emotional reactions. Individual differences research also often focuses on underlying causes, looking for associations with genetic, experiential and/or neurobiological differences that could lead to individual differences.

Some basic psychometrics

Suppose I set up a study to assess individual differences in children’s vocabulary. I decide to look at three measures.
  • Measure A involves asking children to define a predetermined set of words, ordered in difficulty, and scoring their responses by standard criteria.
  • Measure B involves showing the child pictured objects that have to be named.
  • Measure C involves recording the child talking with another child and measuring how many different words they use.
For each of these measures, we’d expect to see a distribution of scores, so we could potentially rank order children on their vocabulary ability. But are the three measures equally good indicators of individual differences?

We can see immediately one problem with Test B: the distribution of scores is bunched tightly, so it doesn’t capture individual variation very well. Test C, which has the greatest spread of scores, might seem the most suitable for detecting individual variation. But spread of scores, while important, is not the only test attribute to consider. We also need to consider whether the measure assesses a stable individual difference, or whether it is influenced by random or systematic factors that are not part of what we want to measure.

There is a huge literature addressing this issue, starting with Francis Galton in the 19th century, with major statistical advances in the 1950s and 1960s (see review by Wasserman & Bracken, 2003). The classical view treats test scores as a compound, with a ‘true score’ part, plus an ‘error’ part. We want a measure that minimises the impact of random or systematic error.

If there is a big influence of random error, then the test score is likely to change from one occasion to the next. Suppose we measure the same children on two occasions a month apart on three new three tests, and then plot scores on time 1 vs time 2. (To simplify this example, we assume that all three tests have the same normal distribution of scores - the same as for test A in Figure 1, and there is an average gain of 10 points from time 1 to time 2).

Figure 2

We can see that Test F is not very reliable: although there is a significant association between the scores on two test occasions, individual children can show remarkable changes from time to time. If our goal is to measure a reasonably stable attribute of the person, then Test F is clearly not suitable. aov
Just because a test is reliable, it does not mean it is valid. But if it is not reliable, then it won’t be valid. This is illustrated by this nice figure from

What about change scores?

Sometimes we explicitly want to measure change: for instance, we may be more interested in how quickly a child learns vocabulary, rather than how much they know at some specific point in time. Surely, then, we don’t want a stable measure, as it would not identify the change? Wouldn’t test F be better than D or E for this purpose?

Unfortunately, the logic here is flawed. It’s certainly possible that people may vary in how much they change from time to time, but if our interest is in change, then what we want is a reliable measure of change. There has been considerable debate in the psychological literature as to how best to establish the reliability of a change measure, but the key point is that you can find substantial change in test scores that is meaningless, and that the likelihood of it being meaningless is substantial if the underlying measure is unreliable. The data in Figure 2 were simulated by assuming that all children changed by the same amount from Time 1 to Time 2, but that tests varied in how much random error was incorporated in the test score. If you want to interpret a change score as meaningful, then the onus is on you to convince others that you are not just measuring random error.

What does this have to do with neuroimaging?

My concern with the neuroimaging literature, is that measures from functional or structural imaging are often used to measure individual differences, but it is rare to find any mention of reliability of those measures. In most cases, we simply don’t have any data on repeated testing using the same measures - or if we do, the sample size is too small, or too selected, to give a meaningful estimate of reliability. Such data as we have don’t inspire confidence that brain measurements achieve high level of reliability that is aimed for in psychometric tests. This does not mean that these measures are not useful, but it does make them unsuited for the study of individual differences.

I hesitated about blogging on this topic, because nothing I am saying here is new: the importance of reliability has been established in the literature on measurement theory since 1950. Yet, when different subject areas evolve independently, it seems that methodological practices that are seen as crucial in one discipline can be overlooked in another that is rediscovering the same issues but with different metrics.

There are signs that things are changing, and we are seeing a welcome trend for neuroscientists to start taking reliability seriously. I started thinking about blogging on this topic just a couple of weeks ago after seeing some high-profile papers that exemplified the problems in this area, but in that period, there have also been some nice studies that are starting to provide information on reliability of neuroscience measures. This might seem like relatively dull science to many, but to my mind it is a key step towards incorporating neuroscience in the study of individual differences. As I commented on Twitter recently, my view is that anyone who wants to using a neuroimaging measure as an endophenotype should first be required to establish that it has adequate reliability for that purpose.

Further reading

This review by Dubois and Adolphs (2016) covers the issue of reliability and much more, and is highly recommended.
Other recent papers of relevance:
Geerligs, L., Tsvetanov, K. A., Cam-CAN, Henson, R. N. 2017 Challenges in measuring individual differences in functional connectivity using fMRI: The case of healthy aging. Human Brain Mapping
Nord, C. L., Gray, A., Charpentier, C. J., Robinson, O. J., Roiser, J. P. 2017 Unreliability of putative fMRI biomarkers during emotional face processing.Neuroimage.

Note: Post updated on 17th June 2017 because figures from R Markdown html were not displaying correctly on all platforms.