|
Language Testing Bytes is a podcast to accompany the SAGE journal Language Testing. Three or four times per year, we will release a podcast in which we discuss topics related to a particular issue of the journal. This may be an interview with a contributor to the journal, or another expert in the field. You can download the podcast from this website, from ltj.sagepub.com, or you can subscribe to the podcast through iTunes.
|
|
Coming Soon. Issue 3 will be released in October 2010, and features an interview with Jim Purpura of Teachers College, Columbia University of New York, on the testing and assessment of grammar.
|
How to put the podcast onto your iPod
- Decide which of the podcasts below you would like to listen to. Right click on the link, and select 'save target as' to download it into a folder on your computer.
- Open iTunes. Click on 'file' and then 'new playlist'. Name your playlist 'Language Testing Bytes'.
- Click on the playlist from the iTunes menu.
- Open the folder in which you saved the podcast, then drag the podcast from the folder and drop it into the playlist.
- Syncronize your iPod.
- When you next access your iPod go to the Language Testing Bytes playlist to play the podcast.
Alternatively, just pop it on whichever mp3 player you currently
use, or subscribe to the SAGE Podcast on iTunes.
|
Language Testing is an international peer reviewed journal that
publishes original research on language testing and assessment. Since
1984 it has featured high impact papers covering theoretical issues,
empirical studies, and reviews. The journal's wide scope encompasses
first and second language testing and assessment of English and other
languages, and the use of tests and assessments as research and
evaluation tools. Many articles also contribute to methodological
innovation and the practical improvement of testing and assessment
internationally. In addition, the journal publishes submissions that
deal with policy issues, including the use of language tests and
assessments for high stakes decision making in fields as diverse as
education, employment and international mobility. The journal welcomes
the submission of papers that deal with ethical and philosophical issues
in language testing, as well as technical matters. Also of concern is
research into the washback and impact of language test use, and
ground-breaking uses of assessments for learning. Additionally, the
journal wishes to publish replication studies that help to embed and
extend our knowledge of generalisable findings in the field. Language
Testing is committed to encouraging interdisciplinary research, and is
keen to receive submissions which draw on theory and methodology from
different fields of applied linguistics, as well as educational
measurement, and other relevant disciplines.
|
Manuscript Submission Information
Free Sample Copy
Email Alerts
|
|
Podcasts
Issue 2: Xiaoming Xi on Automated Scoring
Language Testing 27(3), 2010, is a special issue guest edited by Xiaoming Xi on the automated scoring of writing and speaking tests. In this podcast she talks about why the automated scoring of speaking and writing tests is such a hot topic, and explains the possibilities, limitations and current research issues in the field.
Download:
Xiaoming Xi on Automated Scoring
Or Listen Now:
Issue 1: Mike Kane on Validation
In Language Testing 27(2), 2010, Mike Kane contributed a response to an article on fairness in language testing. We thought this was an excellent opportunity to ask him about his approach to validation, and how he sees 'fairness' fitting into the picture. (Release date: 1st June 2010)
Download:
Mike Kane on Validation
Or Listen Now:
Current Journal Content
Automated scoring and feedback systems: Where are we and where are we heading?
by Xi, X.
The promise of NLP and speech processing technologies in language assessment
by Chapelle, C. A., Chung, Y.-R.
Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of l... (show all)
Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or users in language assessment. This paper introduces NLP for language assessment as an area of inquiry and practice by describing the historical roots coming from computational linguistics, statistical NLP, speech recognition and processing technologies, language assessment, and computer-assisted language learning. It outlines uses of NLP and speech recognition and processing technologies in language assessment through illustrations of current testing projects, and identifies areas in need of further development.
(show less)
Complementing human judgment of essays written by English language learners with e-rater(R) scoring
by Enright, M. K., Quinlan, T.
E-rater® is an automated essay scoring system that uses natural language processing techniques to extract features from essays and to model statistically human holistic ratings. Educational Testing... (show all)
E-rater® is an automated essay scoring system that uses natural language processing techniques to extract features from essays and to model statistically human holistic ratings. Educational Testing Service has investigated the use of e-rater, in conjunction with human ratings, to score one of the two writing tasks on the TOEFL-iBT® writing section. In this article we describe the TOEFL iBT writing section and an e-rater model proposed to provide one of two ratings for the Independent writing task. We discuss how the evidence for a process that uses both human and e-rater scoring is relevant to four components in a validity argument: (a) Evaluation — observations of performance on the writing task are scored to provide evidence of targeted writing skills; (b) Generalization — scores on the writing task provide estimates of expected scores over relevant parallel versions of the task and across raters; (c) Extrapolation — expected scores on the writing task are consistent with other measures of writing ability; and (d) Utilization — scores on the writing task are useful in educational contexts. Finally, we propose directions for future research that will strengthen the case for using complementary methods of scoring to improve the assessment of EFL writing.
(show less)
Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability
by Cushing Weigle, S.
Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of cr... (show all)
Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study approaches validity by comparing human and automated scores on responses to TOEFL® iBT Independent writing tasks with several non-test indicators of writing ability: student self-assessment, instructor assessment, and independent ratings of non-test writing samples. Automated scores were produced using e-rater ®, developed by Educational Testing Service (ETS). Correlations between both human and e-rater scores and non-test indicators were moderate but consistent, providing criterion-related validity evidence for the use of e-rater along with human scores. The implications of the findings for the validity of automated scores are discussed.
(show less)
Validating automated speaking tests
by Bernstein, J., Van Moere, A., Cheng, J.
This paper presents evidence that supports the valid use of scores from fully automatic tests of spoken language ability to indicate a person’s effectiveness in spoken communication. The paper revi... (show all)
This paper presents evidence that supports the valid use of scores from fully automatic tests of spoken language ability to indicate a person’s effectiveness in spoken communication. The paper reviews the constructs, scoring, and the concurrent validity evidence of ‘facility-in-L2’ tests, a family of automated spoken language tests in Spanish, Dutch, Arabic, and English. The facility-in-L2 tests are designed to measure receptive and productive language ability as test-takers engage in a succession of tasks with meaningful language. Concurrent validity studies indicate that scores from the automated tests are strongly correlated with the scores from oral proficiency interviews. In separate studies with learners from each of the four languages the automated tests predict scores from the live interview tests as well as those tests predict themselves in a test-retest protocol (r = 0.77 to 0.92). Although it might be assumed that the interactive nature of the oral interview elicits performances that manifest a distinct construct, the closeness of the results suggests that the constructs underlying the two approaches to oral assessment have a stable relationship across languages.
(show less)
Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring
by Ginther, A., Dimova, S., Yang, R.
Information provided by examination of the skills that underlie holistic scores can be used not only as supporting evidence for the validity of inferences associated with performance tests but also... (show all)
Information provided by examination of the skills that underlie holistic scores can be used not only as supporting evidence for the validity of inferences associated with performance tests but also as a way to improve the scoring rubrics, descriptors, and benchmarks associated with scoring scales. As fluency is considered a critical, perhaps foundational, component of speaking proficiency, temporal measures of fluency are expected to be strongly related to holistic ratings of speech quality.This study examines the relationships among selected temporal measures of fluency and holistic scores on a semi-direct measure of oral English proficiency. The spoken responses of 150 respondents to one item on the Oral English Proficiency Test (OEPT) were analyzed for selected temporal measures of fluency. The examinees represented three first language backgrounds (Chinese, Hindi, and English) and the range of scores on the OEPT scale. While strong and moderate correlations between OEPT scores and speech rate, speech time ratio, mean length of run, and the number and length of silent pauses were found, fluency variables alone did not distinguish adjacent levels of the OEPT scale. Temporal measures of fluency may reasonably be selected for the development of automated scoring systems for speech; however, identification of an examinee’s level remains dependent on aspects of performance only partially represented by fluency measures.
(show less)
EduSpeak(R): A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications
by Franco, H., Bratt, H., Rossier, R., Rao Gadde, V., Shriberg, E., Abrash, V., Precoda, K.
SRI International’s EduSpeak® system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciatio... (show all)
SRI International’s EduSpeak® system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to specific production problems. We review our approach to pronunciation scoring, where our aim is to estimate the grade that a human expert would assign to the pronunciation quality of a paragraph or a phrase. Using databases of nonnative speech and corresponding human ratings at the sentence level, we evaluate different machine scores that can be used as predictor variables to estimate pronunciation quality. For more specific feedback on pronunciation, the EduSpeak toolkit supports a phone-level mispronunciation detection functionality that automatically flags specific phone segments that have been mispronounced. Phone-level information makes it possible to provide the student with feedback about specific pronunciation mistakes.Two approaches to mispronunciation detection were evaluated in a phonetically transcribed database of 130,000 phones uttered in continuous speech sentences by 206 nonnative speakers. Results show that classification error of the best system, for the phones that can be reliably transcribed, is only slightly higher than the average pairwise disagreement between the human transcribers.
(show less)
The utility of article and preposition error correction systems for English language learners: Feedback and assessment
by Chodorow, M., Gamon, M., Tetreault, J.
In this paper, we describe and evaluate two state-of-the-art systems for identifying and correcting writing errors involving English articles and prepositions. Criterion SM, develope... (show all)
In this paper, we describe and evaluate two state-of-the-art systems for identifying and correcting writing errors involving English articles and prepositions. Criterion SM, developed by Educational Testing Service, and ESL Assistant , developed by Microsoft Research, both use machine learning techniques to build models of article and preposition usage which enable them to identify errors and suggest corrections to the writer. We evaluated the effects of these systems on users in two studies. In one, Criterion provided feedback about article errors to native and non-native speakers who were writing an essay for a college-level psychology course. The results showed a significant reduction in the number of article errors in the final essays of the non-native speakers. In the second study, ESL Assistant was used by non-native speakers who were composing email messages. The results indicated that users were selective in their choices among the system’s suggested corrections and that, as a result, they were able to increase the proportion of valid corrections by making effective use of feedback.
(show less)
Book review: Ericsson PF and Haswell R (Eds.) Machine scoring of student essays: Truth and consequences. Logan, UT: Utah University Press, 2006. 274 pp. $24.95. ISBN 978-0-87421-632-5 (paperback)
by Crusan, D.
|