bright
bright
bright
bright
bright
tl
 
tr
lunder

Language Testing Review of the Year 2017

This site designed and maintained by
Prof. Glenn Fulcher

@languagetesting.info
runder
lcunder
rcunder
lnavl
Navigation
lnavr
navtop
lnavs
rnavs
lnavs
rnavs
bottomnav
lnavl
Tools
lnavr
navtop
lnavs
rnavs
lnavs
rnavs
bottomnav
 
 

 

 
     
 

The end of 2017 already, and the start of a New Year. During 2017 I've been on sabbatical. It has been one of those rare times in an academic career when there is the time and space to do a lot more reading than is possible when teaching regular courses; and time to follow the news. If you've read any of my previous reflections upon the year just gone, you will know that there is nothing new under heaven. This year is no different.

They Can't Make their Minds Up. One of the biggest stories of the year - at least given the column inches - has been the use of IELTS for the certification of health professional proficiency to practice in hospitals. This is bizarre. For the past few years in these yearly reviews I've been reporting on the UK government and nurse/doctor groups claiming that the lack of proficiency of international health care professionals is putting patient's lives at risk. Tests were introduced, and the number of recruits fell dramatically. The government and the health service then panicked that there were not enough international nurses to keep the service going. The headlines in 2017 now read Government Blames 96% Fall in EU Nurses Registering for UK Work on Language Test, and the nursing unions who had lobbied for the introduction of language tests suddenly supported petitions to get rid of them. When only 3 of 118 Filipino nurses passed the language tests, managers called for easier tests that they could pass.

This clearly illustrates the way in which language testing can be used by policy makers to implement their own agendas. One year they want tough language tests to keep people out. The next they want easy tests so that they can bring them in. But then the discussion turned rather sensible. This took me by surprise, as I'm not used to rational debate about testing and assessment in the media (or indeed in many more academic contexts)

The outbreak of sensibility began with some newspapers - rather surprisingly the Daily Mail - asking whether readers (let alone nurses) could answer some of the questions on the test. The writing prompt on the left was reproduced by the Mail under the heading Tough language tests blamed for drop in EU nurses: Recruitment firm warn 'inappropriate' exams have led to 96% fall in numbers. Note the use of the word "inappropriate" in this headline. From this point on all the media had two themes that were uncritically held together. The one was "the bar has been set too high" and the other was that the test itself (IELTS) might be replaced by a test that actually had some medical content like the Occupational English Test. The simple point is not difficult to grasp - what does being able to produce an answer to this writing prompt (or for that matter, not being able to) tell us about whether or not a nurse is capable of operating competently in an English medium hospital? In slightly more technical terminology, is it "content relevant"?

Some time ago Fred Davidson and I pondered over these issues - what some people call "test repurposing" - the use of a test that was designed for one purpose being used for a completely different one. From a general or academic prompt like the one reproduced in the Mail, are we able to draw inferences from test scores to future performance in a particular domain like health care. We theorized this and produced a paper that argued for the centrality of validity, which called for test retrofit and the creation of new validation arguments wherever repurposing was involved. You can download and read this article here. I still find it both frustrating and sad that the decision makers in cases like this have little or no understanding of the issues involved, and fail to use the services of professionals who could help them to make much more sensible decisions.

Computer Says No
Continuing on the same theme, there was one story that hit the media this year which excited the discussion lists like no other. This was the story of the Irish Vet who failed the speaking component of the Pearson English Test.

Born and brought up in Ireland, the vet in question had two degrees both obtained at English medium University, and was fully professionally qualified. She was married to an Australian and needed 5 more points on the speaking component of the PET Academic to demonstrate English ability to get a visa to stay with her spouse and work in Australia. The spoken part of this test is assessed by a computer - or "untouched by human hands" as a car manufacturer might claim. The computer is sensitive to things it can "measure" - such as accent, speed of delivery, length of pauses, and so on, all of which it takes to be indicators of "fluency". The "Scottish Elevator" video is a popular take-off of the kinds of problems that can arise when computers do not understand regional accents or dialects. Pearson was very quick to defend its automated scoring system as "normed on native speakers", and posted their research on the lists, which you can read here. They pointed out that "native speaker speech" was highly variable (quoting my research among others), to defend this score and others like it, that seem somewhat strange. But this is precisely the point - it IS VARIABLE - not only by speaker, but by speaking purpose, and context. I have written about this extensively in chapter 3 of my 2015 book Re-Examining Language Testing. The use of uninterpreted surface-level "fluency markers" is to misunderstand the nature of human communication, and the research that supports the correlation between these features and real-world ability is flawed. I am not opposed to research into computer-aided scoring, but we are a long way from being able to leave a computer algorithm to make decisions about the future of human beings. But when decisions about scoring models are essentially taken on economic grounds relating to test volume growth, validation is completely subverted. The only thing that surprises me is that the vet in question went away meekly trying to find some other route to get a visa. There are clearly grounds here for a legal challenge (see Fulcher, G. (2014). Language testing in the dock. In Kunnan, A. J. (Ed.) The Companion to Language Testing (pp.1553 - 1570). London: Wiley-Blackwell). I'm on a roll here! The second story where it's clear that the protagonists need the advice of language testing specialists, but fail to seek it. Looks like the theme of the year is emerging!

All Too Difficult? Or Just Inappropriate?
Or perhaps not! This item really brings together the initial two stories. For me, this article from The Conversation was one of the most powerful of the year, and chimes beautifully with my work on test retrofit. Written by Amanda Muller of Flinders University, the article questions the use of IELTS for profession-specific inferencing and migration. She points out, quite rightly, that "The IELTS organisation has not officially disapproved of the use of the test beyond its original purpose. It comments on recommendation test scores for study, but is quiet on its use for migration or work purposes. But, at least one of the IELTS original designers has openly objected to it."

Australia has long used language tests to control immigration, as most people know. In 2017 the Australian government planned to make the language requirement for citizenship more stringent - a level 6 on IELTS. Once again, both the difficulty level and the appropriacy of the examination were questions. And for once. Just once. The media actually turned to a professional for comment. So it was a joy to hear Professor Tim McNamara of the University of Melbourne being interviewed on this subject on the PM Programme of the Australian ABC Network. Here is a link to the item. And while I personally do not think that the CEFR is anything like an independent "international standard", I agree with the sentiments expressed.

In the United Kingdom too there was concern over language testing and immigration, but here it was all about whether people were cheating. The Home Office enquiry documentation and a video of the committee taking evidence is available here. Once again, no language testing professionals involved. All very depressing.

It may be appropriate to conclude this section with a reference to the Yahoo Finance article about IELTS reaching 2.9 million test takers in 2017. It appears that this has been achieved through "recognition" of the scores by institutions, some of which are listed, and are as diverse as immigration authorities, nursing agencies, and the international monetary fund. Many more are listed on websites. The question that this raises is whether "recognition" has become more important in the testing market place than "validation".

Aviation Language Testing. Once again tests for aviation English were in the news, as a study concluded that many over 200 incidents in 2017 were linked to language issues. The report was produced by Dr Barbara Clark, entitled Aviation English Research Project: Data analysis findings and best practice recommendations. Among the findings was the use of non-standard aviation phraseology, which is a failing of pilots whose first language is English, as they tend to use more colloquial and slang phrases that others may not understand. If you're about to fly this does not make comfortable reading. But at least the airline industry does not use one of the high volume academic language tests to certify pilots and air traffic controllers. Perhaps the reason for this is that a false-positive result would have an immediately noticeable impact.

Grading System Reform.
My final selection this year is from the United Kingdom. And why not - as that is where I live? This is a wonderful story. After the depressing items above, it's lovely to have something that brings tears to the eyes. In 2017 the UK Department of Education decided to reform the school examination grading system. As this BBC report informed the incredulous world, we moved from a letter system (A*, A, B, C, D, E, F, G, U) to a number system (9, 8, 7, 6, 5, 4, 3, 2, 1). The question is: why? This is hardly a significant reform of a scoring model. Just a bureaucratic change in how we refer to the levels. Perhaps most surprisingly, the new grade 1 is the lowest, and grade 9 is the highest. Not even intuitively obvious - unless of course you have almost no understanding of educational psychology at all.

And it is always a pleasure when the bizarre activities of the Educational bureaucrats attract the attention of my favourite satirical show - the BBC News Quiz. And who better to explain this change to us, than the great comedian Andy Hamilton. Play this extract. You'll love it.

Of course, lots of other stuff was going on during the year. But I have to make a selection of what I think are the most important and interesting stories for a short review of the year that you will want to read. So I could have included the story about the US poet whose work was included on a literature examination, and she couldn't answer the questions. Or the ongoing tragedy of fake marriages to "IELTS GIRLS" so that Indian boys who cannot pass the exam have a chance to move to Australia or Canada with their "temporary" spouse. Or the (failed) appeal of UBER against the imposition of language tests for taxi drivers in London. They all show how language testing has become the means by which policy makers control education systems, international mobility, and employment. It is endemic, and mostly badly done. It is time that we insisted on professional representation on decision making bodies so that we can avoid the worst unintended consequences of poor assessment policy and practice.

That is my hope for the New Year. But I won't hold my breath. I wish you all a very Happy 2018, and I will be back with my next review on this day next year.

Glenn Fulcher
1st January 2018