Language Testing Bytes is a podcast to accompany the SAGE journal Language Testing. Three or four times per year, we will release a podcast in which we discuss topics related to a particular issue of the journal. This may be an interview with a contributor to the journal, or another expert in the field. You can download the podcast from this website, from ltj.sagepub.com, or you can subscribe to the podcast through iTunes.
Coming Soon: The next podcast will be released in January 2014. We will release details of the topic and guests as soon as they are available. Watch this space.
The present study examined pedagogic components of Chinese reading literacy in a representative sample of 1164 Grades 7, 9 and 11 Chinese students (mean age of 15 years) from 11 secondary schools in Hong Kong with each student tested for about 2.5 hours. Multiple group confirmatory factor analyses showed that across the three grade levels, the eight reading literacy constructs (Essay Writing, Morphological Compounding, Correction of Characters and Words, Segmentation of Text, Text Comprehension, Copying of Characters and Words, Writing to Dictation and Reading Aloud), each subserved by multiple indicators, had differential concurrent prediction of scaled internal school performance in reading and composing. Writing–reading and their interactive effects were foremost in their predictive power, followed by performance in error correction and writing to dictation, morphological compounding, segmenting text and copying with reading aloud playing a negligible role. Our battery of tasks with some refinement could serve as a screening instrument for secondary Chinese students struggling with Chinese reading literacy.
The research described in this article investigates test takers’ cognitive processing while completing onscreen IELTS (International English Language Testing System) reading test items. The research aims, among other things, to contribute to our ability to evaluate the cognitive validity of reading test items (Glaser, 1991; Field, in press).
The project focused on differences in reading behaviours of successful and unsuccessful candidates while completing IELTS test items. A group of Malaysian undergraduates (n = 71) took an onscreen test consisting of two IELTS reading passages with 11 test items. Eye movements of a random sample of these participants (n = 38) were tracked. Stimulated recall interview data was collected to assist in interpretation of the eye-tracking data.
Findings demonstrated significant differences between successful and unsuccessful test takers on a number of dimensions, including their ability to read expeditiously (Khalifa & Weir, 2009), and their focus on particular aspects of the test items and texts, while no observable difference was noted in other items. This offers new insights into the cognitive processes of candidates during reading tests. Findings will be of value to examination boards preparing reading tests, to teachers and learners, and also to researchers interested in the cognitive processes of readers.
Development and administration of institutional ESL placement tests require a great deal of financial and human resources. Due to a steady increase in the number of international students studying in the United States, some US universities have started to consider using standardized test scores for ESL placement. The English Placement Test (EPT) is a locally administered ESL placement test at the University of Illinois at Urbana-Champaign (UIUC). This study examines the appropriateness of using pre-arrival SAT, ACT, and TOEFL iBT test scores as an alternative to the EPT for placement of international undergraduate students into one of the two levels of ESL writing courses at UIUC. Exploratory analysis shows that only the lowest SAT Reading and ACT English scores, and the highest TOEFL iBT total and Writing section scores can separate the students between the two placement courses. However, the number of undergraduate ESL students, who scored at the lowest and highest ends of each of these test scales, has been very low over the last six years (less than 5%). Thus, setting cutoff scores for such a small fraction of the ESL population may not be very practical. As far as the majority of the undergraduate ESL population is concerned, there is about a 40% chance that they may be misplaced if the placement decision is made solely on the standardized test scores.
When implementing standard setting procedures, there are two major concerns: variance between panelists and efficiency in conducting multiple rounds of judgments. With regard to the former, there is concern over the consistency of the cutoff scores made by different panelists. If the cut scores show an inordinately wide range then further rounds of group discussion are required to reach consensus, which in turn leads to the latter concern. The Yes/No Angoff procedure is typically implemented across several rounds. Panelists revise their original decisions for each item based on discussion with co-panelists between each round. The purpose of this paper is to demonstrate a framework for evaluating the judgments in the standard setting process. The Multifaceted Rasch model was applied as a tool to evaluate the quality of standard setting in a context of language assessment. The results indicate that the Multifaceted Rasch model offers a promising approach to examination of the variability in the standard setting procedures. In addition, this model can identify aberrant decision making for each panelist, which can be used as feedback for both standard setting designers and panelists.
This study examined the influence of prompt characteristics on the averages of all scores given to test taker responses on the TOEFL iBTTM integrated Read-Listen-Write (RLW) writing tasks for multiple administrations from 2005 to 2009. In the context of TOEFL iBT RLW tasks, the prompt consists of a reading passage and a lecture.
To understand characteristics of individual prompts, 107 previously administered RLW prompts were evaluated by participants on nine measures of perceived task difficulty via a questionnaire. Because some of the RLW prompts were administered more than once, multilevel modeling analyses were conducted to examine the relationship between ratings of the prompt characteristics and the average RLW scores, while taking into account dependency among the observed average RLW scores and controlling for differences in the English ability of the test takers across administrations.
Results showed that some of the variation in the average RLW scores was attributable to differences in the English ability of the test takers that also varied across administrations. Two variables related to perceived task difficulty, distinctness of ideas within the prompt and difficulty of ideas in the passage, were also identified as potential sources of variation in the average RLW scores.
"Vocabulary and structural knowledge" (Grabe, 1991, p. 379) appears to be a key component of reading ability. However, is this component to be taken as a unitary one or is structural knowledge a separate factor that can therefore also be tested in isolation in, say, a test of syntax? If syntax can be singled out (e.g. in order to investigate its contribution to reading ability), this test of syntactic knowledge would require validation. The usefulness and reliability of using expert judgments as a means of analysing the content or difficulty of test items in language assessment has been questioned for more than two decades. Still, groups of expert judges are often called upon as they are perceived to be the only or at least a very convenient way of establishing key features of items. Such judgments, however, are particularly opaque and thus problematic when judges are required to make categorizations where categories are only vaguely defined or are ontologically questionable in themselves. This is, for example, the case when judges are asked to classify the content of test items based on a distinction between lexis and syntax, a dichotomy corpus linguistics has suggested cannot be maintained. The present paper scrutinizes a study by Shiotsu (2010) that employed expert judgments, on the basis of which claims were made about the relative significance of the components ‘syntactic knowledge’ and ‘vocabulary knowledge’ in reading in a second language. By both replicating and partially replicating Shiotsu’s (2010) content analysis study, the paper problematizes not only the issue of the use of expert judgments, but, more importantly, their usefulness in distinguishing between construct components that might, in fact, be difficult to distinguish anyway. This is particularly important for an understanding and diagnosis of learners’ strengths and weaknesses in reading in a second language.
Language Testing is an international peer reviewed journal that
publishes original research on language testing and assessment. Since
1984 it has featured high impact papers covering theoretical issues,
empirical studies, and reviews. The journal's wide scope encompasses
first and second language testing and assessment of English and other
languages, and the use of tests and assessments as research and
evaluation tools. Many articles also contribute to methodological
innovation and the practical improvement of testing and assessment
internationally. In addition, the journal publishes submissions that
deal with policy issues, including the use of language tests and
assessments for high stakes decision making in fields as diverse as
education, employment and international mobility. The journal welcomes
the submission of papers that deal with ethical and philosophical issues
in language testing, as well as technical matters. Also of concern is
research into the washback and impact of language test use, and
ground-breaking uses of assessments for learning. Additionally, the
journal wishes to publish replication studies that help to embed and
extend our knowledge of generalisable findings in the field. Language
Testing is committed to encouraging interdisciplinary research, and is
keen to receive submissions which draw on theory and methodology from
different fields of applied linguistics, as well as educational
measurement, and other relevant disciplines.
How to put the podcast onto your iPod
Decide which of the podcasts below you would like to listen to. Right click on the link, and select 'save target as' to download it into a folder on your computer.
Open iTunes. Click on 'file' and then 'new playlist'. Name your playlist 'Language Testing Bytes'.
Click on the playlist from the iTunes menu.
Open the folder in which you saved the podcast, then drag the podcast from the folder and drop it into the playlist.
Syncronize your iPod.
When you next access your iPod go to the Language Testing Bytes playlist to play the podcast.
Alternatively, just pop it on whichever mp3 player you currently
use, or subscribe to the SAGE Podcast on iTunes.
Issue 15: Stephen Bax on Eye Tracking Studies
Issue 30(4) of the journal contains the first paper on eye-tracking studies to investigate the cognitive processes of learners taking reading tests. Stephen Bax joins us to explain the methodology and what it can tell us about how successful readers go about processing items and texts in reading tests.
Issue 30(3) commemorates the 30th Anniversary of the founding of the journal. We mark this milestone in the journal's history with a special issue on the topic of Assessment Literacy, guest edited by Ofra Inbar. A concern for the literacy needs of a wide range of stakeholders who use tests and test scores beyond the experts is a sign of a maturing profession. This issue takes the debate forward in new and exciting ways, some of which Ofra Inbar discusses on this podcast.
Issue 13: Paula Winke and Susan Gass on Rater Bias
Rater bias is something that language testers have known about for a long time, and have tried to control through training and the use of rating scales. But investigations into the source and nature of bias is relatively recent. In issue 30(2) of the journal Paula Winke, Susan Gass, and Caroly Myford share their research in this field, and the first two authors from Michigan State University join us on Language Testing Bytes to discuss rater bias.
Issue 12: Alan Davies on Assessing Academic English
In 2008 Alan Davies' book Assessing Academic English was published by Cambridge University Press. In issue 30(1) of Language Testing it is reviewed by Christine Coombe. With a strong historical narrative, the book raises many of the enduring issues in assessing English for study in English medium institutions. In this podcast we explore some of these with Professor Davies.
Issue 11: Ana Pellicer-Sanchez and Norbert Schmitt on Yes-No Vocabulary Tests
In this issue of the podcast we return to vocabulary testing, after the great introduction provided by John Read in Issue 5. This time, we welcome Ana Pellicer-Sanchez and Norbert Schmitt, to talk about the popular Yes-No Vocabuluary Test. Their recent research looks at scoring issues and potential solutions to problems that have plagued the test for years. Their paper in issue 29(4) of the journal contains the details, but in the podcast we discuss the key issues for vocabulary assessment.
Issue 10: Kathryn Hill on Classroom Based Assessment
Classroom Based Assessment is an increasingly important topic in language education, and in issue 29(3) of Language Testing we publish a paper by Kathryn Hill and Tim McNamara entitled "Developing a comprehensive, empirically based research framework for classroom-based assessment". The research in this paper is based on the first author's PhD dissertation, and so we asked Kathryn Hill to join us on Language Testing Bytes to talk about developments in the field.
Issue 9: Luke Harding on Accent in Listening Assessment
Issue 29(2) of the journal contains a paper entitled "Accent, listening assessment and the potential for a shared-L1 advantage: A DIF perspective", by Luke Harding. In this podcast we explore why it is that most listening tests use a very narrow range of standard accents, rather than the many varieties that we are likely to encounter in real-world communication.
Issue 8: Tan Jin and Barley Mak on Confidence Scoring
In Issue 29(1) of the journal three authors from the Chinese University of Hong Kong have a paper on the application of fuzzy logic to scoring speaking tests. This is termed 'confidence scoring', and the first two authors join us on Language Testing Bytes to explain a little more about their novel approach.
Mark Wilson delivered the Messick Memorial Lecture at the Language Testing Research Colloquium in Melbourne, 2006, on new developments in measurement models to take into account the complexity of language testing. In Language Testing 28(4) we publish the paper based on this lecture, and Mark joins us on Language Testing Bytes to talk about his work in this area.
Issue 6: Craig Deville and Micheline Chalhoub-Deville on Standards-Based Testing
Standards-Based Testing is highly controversial for its social and educational impact on schools and bilingual communities, and the technical aspects that rely to a significant extent on expert judgment. In issue 28(3) we discuss the issues surrounding Standards-Based Testing in the United States with the guest editors of a special issue on this topic. The collection of papers that they have brought together, along with reviews of recent books on the topic, and test review, constitute a state of the art volume for the field.
The journal has seen a flurry of articles on vocabulary testing in recent months, and issue 28(2) is no exception, with Marta Fairclough's paper on the lexical recognition task. It seemed like an appropriate moment to conisder why vocabulary is receiving so much attention, and so we turned to Professor John Read of the University of Auckland, New Zealand, to give us an overview of current research and activity within the field.
Issue 4: Khaled Barkaoui and Melissa Bowles on Think Aloud Protocols
In Language Testing 28(1), 2011, Khaled Barkaoui has an article on the use of think-alouds to investigate rater processes and decisions as they rate essay samples. The focus is not on the raters, but on whether the research method is a useful tool for the purpose. In this podcast he explains his findings, and their importance. We are then joined by Melissa Bowles who has recently published The Think-Aloud Controversy in Second Language Research, to explain precisely what the problems and possibilities of think-alouds are in language testing research.
Language Testing 27(4), 2010, contains an article by Carol Chapelle and colleagues on testing productive grammatical ability. We thought this would be an excellent opportunity to look at what is going on in the field of assessing grammar, and what issues currently face the field. Jim Purpura agreed to talk to us on Language Testing Bytes.
Language Testing 27(3), 2010, is a special issue guest edited by Xiaoming Xi on the automated scoring of writing and speaking tests. In this podcast she talks about why the automated scoring of speaking and writing tests is such a hot topic, and explains the possibilities, limitations and current research issues in the field.
In Language Testing 27(2), 2010, Mike Kane contributed a response to an article on fairness in language testing. We thought this was an excellent opportunity to ask him about his approach to validation, and how he sees 'fairness' fitting into the picture.