Language Testing Bytes is a biannual podcast produced for SAGE publications to accompany the journal Language testing. In the podcast I discuss issues raised in the journal with authors and key figures in the field. You can download a podcast for your iPod or other device by right clicking on the download icon, or you can play a podcast directly from this page. Also available on iTunes.
Issue 22: Eunice Jang on Diagnostic Language Testing.
Issue 32(2) of the Language Testing is a special on the current state of Diagnostic Language Testing. While this has traditionally been a neglected use of language tests, there is currently a great surge of interest and research in the field. Eunice Jang from the University of Toronto joins me to discuss current thinking in testing for diagnostic purposes.
The assessment of aviation English has become something of an icon of high stakes assessment in recent years. In Language Testing 32(2), we publish a paper by Hyejeong Kim and Cathie Elder, both from the University of Melbourne, which examines the construct of aviation English from the perspective of airline professionals in Korea.
In this issue of the podcast Martin East describes an assessment reform project in New Zealand. We're reminded very forcefully that when assessment and testing procedures within educational systems are changed, there are many complex factors to take into account. All stakeholders are going to take a view on the proposed reforms, and they aren't necessarily going to agree.
Issue 19: Fred Davidson and Cary Lin of the University of Illinois at Urbana-Champaign discuss the role of statistics in language testing.
The last issue of volume 31 contains a review of Rita Green's new book on statistics in language testing. We take the opportunity to talk about how things have changed in teaching statistics for students of language testing since Fred Davidson's The language tester's statistical toolbox was published in 2000.
Issue 18: Folkert Kuiken and Ineke Vedder from the University of Amsterdam discuss rater variability in the assessment of speaking and writing in a second language.
The third issue of the journal this year is a special on the scoring of performance tests. In this podcast the guest editors talk about some of the issues surrounding the rating of speaking and writing samples.
Issue 17: Ryo Nitta and Fumiyo Nakatsuhara on pre-task planning in paired speaking tests
The authors of our first paper in 31(2) are concerned with a very practical question. What is the effect of giving test-takers planning time prior to a paired-format speaking task? Does it affect what they say? Does it change the scores they get? The answers will inform the design of speaking tests not only in high stakes assessment contexts, but probably in classrooms as well.
Issue 16: Jodi Tommerdahl and Cynthia Kilpatrick on the reliability of morphological analyses in language samples
How large a language sample do we need in order to draw reliable conclusions about what we wish to assess? In issue 31(1) of Language Testing we are delighted to publish a paper by Jodi Tommerdahl and Cynthia Kilpatrick that addresses this important issue.
Issue 30(4) of the journal contains the first paper on eye-tracking studies to investigate the cognitive processes of learners taking reading tests. Stephen Bax joins us to explain the methodology and what it can tell us about how successful readers go about processing items and texts in reading tests.
Issue 30(3) commemorates the 30th Anniversary of the founding of the journal. We mark this milestone in the journal's history with a special issue on the topic of Assessment Literacy, guest edited by Ofra Inbar. A concern for the literacy needs of a wide range of stakeholders who use tests and test scores beyond the experts is a sign of a maturing profession. This issue takes the debate forward in new and exciting ways, some of which Ofra Inbar discusses on this podcast.
Issue 13: Paula Winke and Susan Gass on Rater Bias
Rater bias is something that language testers have known about for a long time, and have tried to control through training and the use of rating scales. But investigations into the source and nature of bias is relatively recent. In issue 30(2) of the journal Paula Winke, Susan Gass, and Caroly Myford share their research in this field, and the first two authors from Michigan State University join us on Language Testing Bytes to discuss rater bias.
Issue 12: Alan Davies on Assessing Academic English
In 2008 Alan Davies' book Assessing Academic English was published by Cambridge University Press. In issue 30(1) of Language Testing it is reviewed by Christine Coombe. With a strong historical narrative, the book raises many of the enduring issues in assessing English for study in English medium institutions. In this podcast we explore some of these with Professor Davies.
Issue 11: Ana Pellicer-Sanchez and Norbert Schmitt on Yes-No Vocabulary Tests
In this issue of the podcast we return to vocabulary testing, after the great introduction provided by John Read in Issue 5. This time, we welcome Ana Pellicer-Sanchez and Norbert Schmitt, to talk about the popular Yes-No Vocabuluary Test. Their recent research looks at scoring issues and potential solutions to problems that have plagued the test for years. Their paper in issue 29(4) of the journal contains the details, but in the podcast we discuss the key issues for vocabulary assessment.
Issue 10: Kathryn Hill on Classroom Based Assessment
Classroom Based Assessment is an increasingly important topic in language education, and in issue 29(3) of Language Testing we publish a paper by Kathryn Hill and Tim McNamara entitled "Developing a comprehensive, empirically based research framework for classroom-based assessment". The research in this paper is based on the first author's PhD dissertation, and so we asked Kathryn Hill to join us on Language Testing Bytes to talk about developments in the field.
Issue 9: Luke Harding on Accent in Listening Assessment
Issue 29(2) of the journal contains a paper entitled "Accent, listening assessment and the potential for a shared-L1 advantage: A DIF perspective", by Luke Harding. In this podcast we explore why it is that most listening tests use a very narrow range of standard accents, rather than the many varieties that we are likely to encounter in real-world communication.
Issue 8: Tan Jin and Barley Mak on Confidence Scoring
In Issue 29(1) of the journal three authors from the Chinese University of Hong Kong have a paper on the application of fuzzy logic to scoring speaking tests. This is termed 'confidence scoring', and the first two authors join us on Language Testing Bytes to explain a little more about their novel approach.
Mark Wilson delivered the Messick Memorial Lecture at the Language Testing Research Colloquium in Melbourne, 2006, on new developments in measurement models to take into account the complexity of language testing. In Language Testing 28(4) we publish the paper based on this lecture, and Mark joins us on Language Testing Bytes to talk about his work in this area.
Issue 6: Craig Deville and Micheline Chalhoub-Deville on Standards-Based Testing
Standards-Based Testing is highly controversial for its social and educational impact on schools and bilingual communities, and the technical aspects that rely to a significant extent on expert judgment. In issue 28(3) we discuss the issues surrounding Standards-Based Testing in the United States with the guest editors of a special issue on this topic. The collection of papers that they have brought together, along with reviews of recent books on the topic, and test review, constitute a state of the art volume for the field.
The journal has seen a flurry of articles on vocabulary testing in recent months, and issue 28(2) is no exception, with Marta Fairclough's paper on the lexical recognition task. It seemed like an appropriate moment to conisder why vocabulary is receiving so much attention, and so we turned to Professor John Read of the University of Auckland, New Zealand, to give us an overview of current research and activity within the field.
Issue 4: Khaled Barkaoui and Melissa Bowles on Think Aloud Protocols
In Language Testing 28(1), 2011, Khaled Barkaoui has an article on the use of think-alouds to investigate rater processes and decisions as they rate essay samples. The focus is not on the raters, but on whether the research method is a useful tool for the purpose. In this podcast he explains his findings, and their importance. We are then joined by Melissa Bowles who has recently published The Think-Aloud Controversy in Second Language Research, to explain precisely what the problems and possibilities of think-alouds are in language testing research.
Language Testing 27(4), 2010, contains an article by Carol Chapelle and colleagues on testing productive grammatical ability. We thought this would be an excellent opportunity to look at what is going on in the field of assessing grammar, and what issues currently face the field. Jim Purpura agreed to talk to us on Language Testing Bytes.
Language Testing 27(3), 2010, is a special issue guest edited by Xiaoming Xi on the automated scoring of writing and speaking tests. In this podcast she talks about why the automated scoring of speaking and writing tests is such a hot topic, and explains the possibilities, limitations and current research issues in the field.
In Language Testing 27(2), 2010, Mike Kane contributed a response to an article on fairness in language testing. We thought this was an excellent opportunity to ask him about his approach to validation, and how he sees 'fairness' fitting into the picture.
Gaining insights from domain experts into how they view communication in real world settings is recognized as an important authenticity consideration in the development of criteria to assess language proficiency for specific academic or occupational purposes. These "indigenous" criteria represent an articulation of the test construct and should therefore reflect what is germane to the particular domain of language use rather than general language-focused criteria familiar from other language tests. The methodological question of how to elicit such insights is, however, complex and has been addressed by various researchers using different methodological and theoretical frameworks.
The paper draws on data from a larger research project to explore the affordances and constraints of more or less direct approaches to eliciting domain experts’ perspectives on what matters for effective communication in the workplace. The domain experts in this case were physiotherapy educators and supervisors. The study offers a qualitative comparison of expert feedback gathered from three different sites. Two were in the workplace where the communication skills of physiotherapy students in training were assessed routinely and the feedback given to them was naturally occurring rather than elicited. The third was a more artificial workshop setting in which video-recorded interactions between student and patients or simulated patients (i.e., actors role-playing a patient) were shown to two groups of expert informants who were then asked by the researcher to comment on the strengths and weaknesses of each performance.
A qualitative analysis revealed that the nature of expert feedback differed significantly at each site, with the routinely occurring feedback containing scant and vague reference to language and communication aspects. The workshop setting, although it was less authentic, yielded much richer insights into the physiotherapists’ views about workplace communication. The implications of our findings for the development of relevant language test criteria are considered.
The indigenous assessment practices (Jacoby & McNamara, 1999) in selected health professions were investigated to inform a review of the scope of assessment in the speaking sub-test of a specific-purpose English language test for health professionals, the Occupational English Test (OET). The assessment criteria in current use on the test represent a generalized view of language and are concerned with Overall Communicative Effectiveness, Fluency, Intelligibility, Appropriateness of Language, and Resources of Grammar and Expression. The research study focused on healthcare consultations between trainee health professionals and patients. Educators and supervisors observed these interactions and subsequently provided feedback on trainees’ performances. The assumption was that, in their comments, educators would give information pertinent to trainees’ acculturation to the expectations and behaviours of the profession, that is, to "what matters" to practitioners. Thematic analysis was undertaken to establish the aspects of performance that matter to health professionals in these contexts. Data for each profession were coded independently. Clear similarities across the professions became apparent as themes emerged. An exploratory conceptual model of what health professionals value in the consultation was developed, comprising three focal areas: foundation, performance and goals of the consultation. Findings from the analysis provided an empirical basis for the generation and definition of two additional, professionally relevant criteria for use in the OET speaking sub-test – Clinician Engagement and Management of Interaction – and of a checklist of performance indicators to be used to train assessors in applying the new criteria. This process of developing, through close analysis of domain experts’ commentary, test criteria that are potentially more authentic to the target language use situation is novel and may be replicated effectively in other specific-purpose language testing contexts.
Criticism of specific-purpose language (LSP) tests is often directed at their limited ability to represent fully the demands of the target language use situation. Such criticisms extend to the criteria used to assess test performance, which may fail to capture what matters to participants in the domain of interest. This paper reports on the outcomes of an attempt to expand the construct of a specific-purpose test through the inclusion of two new professionally relevant criteria designed to reflect the values of domain experts. The test in question was the speaking component of the Occupational English Test (OET), designed to assess the language proficiency of overseas-trained health professionals applying to practise their profession in Australia.
The criteria were developed from analysis of health professionals’ feedback to trainees, a source that reflected what the professionals value, that is, their indigenous assessment criteria. The criteria considered amenable to inclusion in the OET were as follows: (1) Clinician Engagement with the patient and (2) Management of Interaction in the consultation. Seven OET assessors were trained to apply these professionally relevant criteria at a workshop that introduced a checklist derived from the original data analysis as a tool to aid understanding of the new criteria. Following the workshop, assessors rated a total of 300 pre-recorded OET speaking test performances using both new and existing criteria. Statistical analyses of the ratings indicate the extent to which a) the judgements of the language-trained assessors using the new criteria were consistent and b) the new and existing criteria aligned in terms of the construct(s) they represent. Furthermore, feedback from the assessors in the process shows how comfortable and confident they are to represent a health professional perspective.
This paper considers how to establish the minimum required level of professionally relevant oral communication ability in the medium of English for health practitioners with English as an additional language (EAL) to gain admission to practice in jurisdictions where English is the dominant language. A theoretical concern is the construct of clinical communicative competence and its separability (or not) from other aspects of professional competence, while a methodological question examines the technical difficulty of determining a defensible minimum standard. The paper reports on a standard-setting study to set a minimum standard of professionally relevant oral competence for three health professions – medicine, nursing, and physiotherapy – as measured by the speaking sub-test of the Occupational English Test, a profession-specific test of clinically related communicative competence. While clinical educators determined the standard, it is to be implemented by raters trained as teachers of EAL; therefore, the commensurability of the views of each group is a central issue. This also relates to where the limits of authenticity lie in the context of testing language for specific purposes: to represent the views of domain experts, a sufficient alignment of their views with scores given by the raters of test performances is vital. The paper considers the construct of clinical communicative competence and describes the standard-setting study, which used the analytical judgement method. The method proved successful in capturing sufficiently consistent judgements to define defensible standards. Findings also indicate that raters can act as proxies for occupational experts, although it remains unclear whether the views of performances held by these two groups are directly comparable. The new minimum standards represented by the cut scores were found to be somewhat harsher than those in current use, particularly in medicine.
This paper explores the views of nursing and medical domain experts in considering the standards for a specific-purpose English language screening test, the Occupational English Test (OET), for professional registration for immigrant health professionals. Since individuals who score performances in the test setting are often language experts rather than domain experts, there are possible tensions between what is being measured by a language test and what is deemed important by domain experts. Another concern is a lack of qualitative research on the process of the standard setting. To date, no published qualitative work has been identified about the contributions of domain experts in the standard setting for healthcare communication. In this study, a standard-setting exercise was conducted for the speaking component of the OET, using judgements of nursing and medical clinical educators and supervisors. In all, 13 medical and 18 nursing clinical educators and supervisors rated medical and nursing candidate performances respectively. These performances were audio-recorded OET role-plays that were selected across a range of proficiency levels. Domain experts were invited to comment on the basis of their decisions and the extent of alignment between these decisions and the criteria used to assess performance on the OET. Nursing and medical domain experts showed that they attended to all of the OET criteria in making their decisions about standards. However, clinical scenario simulation also invited judgements of clinical competence from participants, even where they knew that clinical competence should be excluded from their decision-making. Another concern related to the authenticity limitations of the role-play tasks as evidence of readiness to handle communication in the workplace. Overall, findings support the value of qualitative evidence from the standard setting in providing insight into the factors informing and impeding decision-making.
The aim of this paper is to investigate from a discourse analytic perspective task authenticity in the speaking component of the Occupational English Test (OET), an English language screening test for clinicians designed to reflect the language demands of health professional–patient communication. The study compares the OET speaking sub-test roleplay performances of 12 doctors who were successful OET candidates with practice Objective Structured Clinical Examination (OSCE) roleplay performances of 12 international medical graduates (IMGs) preparing for the Australian Medical Council clinical examination. The premise for the comparison is that the OSCE roleplays can represent communication practices that are valued within the medical profession; therefore a finding of similarity in the discourse structure across the OET and the OSCE roleplays could be taken as supporting the validity of the OET as a tool for eliciting relevant communication skills in the medical profession.
The study draws on genre theory as developed in Systemic Functional Linguistics (SFL) in order to compare the roleplay discourse structure and the linguistic realizations of the two tasks. In particular, it examines the role relationships of the participants (i.e. the tenor of the discourse), and the ways in which content is represented (i.e. the field of the discourse) by roleplay participants. The findings reveal some key similarities but also important differences. Although both tests inevitably fall short in terms of authentic representation of real world interactions, the findings suggest that the OET task, for a range of reasons including time allowances, training of test interlocutors, and the limits of contextual information provided to candidates, constrains candidate topic exploration and treatment negotiation, compared to the OSCE format. The paper concludes with proposals for mitigating these limitations in the interests of enhancing the OET’s capacity to elicit more professionally relevant language and communication skills.
Objects that sit between intersecting social worlds, such as Language for Specific Purposes (LSP) tests, are boundary objects – dynamic, historically derived mechanisms which maintain coherence between worlds (Star & Griesemer, 1989). They emerge initially from sociopolitical mandates, such as the need to ensure a safe and efficient workforce or to control immigration, and they develop into standards (i.e. stabilized classifying mechanisms). In this article, we explore the concept of LSP test as boundary object through a qualitative case study of the Occupational English Test (OET), a test which assesses the English proficiency of healthcare professionals who wish to practise in English-speaking healthcare contexts. Stakeholders with different types of vested interest in the test were interviewed (practising doctors and nurses who have taken the test, management staff, professional board representatives) to capture multiple perspectives of both the test-taking experience and the relevance of the test to the workplace. The themes arising from the accumulated stakeholder perceptions depict a ‘boundary object’ that encompasses a work-readiness level of language proficiency on the one hand and aspects of communication skills for patient-centred care on the other. We argue that the boundary object metaphor is useful in that it represents a negotiation over the adequacy and effects of a test standard for all vested social worlds. Moreover, the test should benefit the worlds it interconnects, not just in terms of the impact on the learning opportunities it offers candidates, but also the impact such learning carries into key social sites, such as healthcare workplaces.
This commentary argues that the OET research raises inescapable contradictions in trying to separate "language" from "communication" within a weak performance test and advocates for reconceptualizing the legitimate domain of "language" more widely, reclaiming the full potential of the communicative competence framework.