- Abedi, J. 2004. The No Child Left Behind Act and English Language Learners: Assessment and Accountability Issues
Educational Researcher 33, 1, 4 - 44.
- Agard, F. B. and Dunkel, H. B. (1948). An Investigation into Second Language Teaching. Boston: Ginn & Company.
- Alderson, J. C, and Banerjee, J. 2002.
Language testing and assessment (Part 1).
Language Teaching, 34, 213 - 236.
- Alderson, J. C, and Banerjee, J. 2002.
Language testing and assessment (Part 2).
Language Teaching, 35, 79 - 113.
- Alderson, J. C, and Hughes, A. (1981)
Issues in Language Testing. ELT Documents 111. London: British Council.
- Amrein, A. L., Berliner, D. C. & Rideau, S 2010.
Cheating in the first, second, and third degree: Educators' responses to high-stakes testing.
Education Policy Analysis Archives, 18, 14.
- Amrein-Beardsley, A. L. and Berliner, D. C. 2002.
High Stakes Testing, Uncertainty, and Student Learning.
Education Policy Analysis Archives, 10, 18.
- Anonymous Evaluation and Assessment Primer
- Anonymous, 2009. Computer-based and paper-pencil test comparability.
Pearson Education: Test, Measurement and Research Services Bulletin 9
- Assessment Reform Group. (1999). Beyond the Black Box.
- Assessment Reform Group. (2002). Testing, Motivation and Learning. Cambridge: University of Cambridge Faculty of Education.
- Atkinson, T and Davies, G. 2000.
Computer Aided Assessment and Language Learning. ICLT4LT.
- Atkinson, R. C. and Geiser, S. 2010. Reflections on a century of College Admissions Tests. Educational Researcher 38, 9, 665 - 667.
- Au, W. 2007.
High Stakes Testing and Curricular Control.
Educational Researcher, 36, 5.
- Aryadoust, V. 2011.
Validity Arguments of the Speaking and Listening Modules of International English Language Testing System: A Synthesis of Existing Research.
The Asian ESP Journal, 7, 2.
- Campbell, D. T. and Fiske, D. W. 1959. Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix
Psychological Bulletin 56, 2, 81 - 105.
- Camilli, G. 1996.
Standard Errors in Educational Assessment: A Policy Analysis Perspective
Education Policy Analysis Archives 4, 4.
- Canagarajah, S. 2006.
Changing Communicative Needs, Revised Assessment Objectives: Testing English as an International Language
Language Assessment Quarterly, 3, 3, 229 - 242.
- Canale, M. and Swain, M. 1980.
Theoretical Bases of Communicative Approaches to Second Language Teaching and Testing.
Applied Linguistics 1, 1, 1 - 47.
- Carrell, P. L. 2007.
Notetaking strategies and their relationship to performance on listening comprehension and communicative assessment tasks.
TOEFL Monograph No. MS-35. Princeton, NJ: Educational Testing Service.
- Carrell, P. L. , Dunkel, P. A. 2004.
The effects of notetaking on listening comprehension.
Applied Language Learning 14, 1, 83 - 105.
- Carrell, P. L. , Dunkel, P. A. and Mollaun, P. 2002.
The effects of notetaking, lecture length, and topic on the listening component of TOEFL 2000.
TOEFL Monograph No. MS-23. Princeton, NJ: Educational Testing Service.
- Celik, M. 1999.
Testing Some Suprasegmental Features of English Speech The Internet TESL Journal, 5, 8.
- Chalhoub-Deville, M. 1993.
Performance Assessment and the Components of the Oral Construct Across Tasks and Rater Groups. ERIC.
- Chalhoub-Deville, M. 2001.
Language Testing and Technology: Past and Future
Language Learning and Technology, Vol 5, No. 2, May 2001, 95 - 98.
- Chalhoub-Deville, M. and Fulcher, G. 2003.
The Oral Proficiency Interview: A Research Agenda
Foreign Language Annals, 36, 4, 498 - 506.
- Chapman, M. 2003.
TOEIC: Tried but Undertested.
JALT Testing and Evaluation SIG Newsletter, 7, 3, 2 - 5.
- Cimbricz, S. 2002.
State-mandated testing and teachers' beliefs and practice.
Education Policy Analysis Archives 10, 2.
- Clapham, C. 2000.
Assessment and Testing.
Annual Review of Applied Linguistics, 20, 147 - 161.
- Cohen, A. D., & Upton, T. A. 2006.
Strategies in responding to new TOEFL reading tasks.
TOEFL Monograph No. MS-33. Princeton, NJ: Educational Testing Service.
- Commitee on Assessment and Evaluation in Education. 2005.
The Knowledge Base for Assessment and Evaluation in Education.
Israel Academy of Sciences and Humanities; Ministry of Education, Culture and Sport;
Rochschild Foundation (Yad Hanadiv).
- Coniam, D. and Falvey, P. 1999.
Assessor training in a high-stakes test of speaking: The Hong Kong English language benchmarking initiative.
Melbourne Papers in Language Testing 8, 2.
- Coombe, C. 2002.
Self-assessment in language testing: Reliability and validity issues.
Karen's Linguistics Issues.
- Cronbach, L. J. and Meehl, P. E. 1955.
Construct Validity in Psychological Tests
Psychological Bulletin, 52, 281 - 302.
- Cumming, A. 1994.
Does Language Assessment Facilitate Recent Immigrants' Participation in Canadian Society?
TESL Canada Journal, 11, 2, 117 - 133.
- Cumming, A., Grant, L., Mulcahy-Ernt, P., & Powers, D. E. 2005.
A teacher-verification study of speaking and writing prototype tasks for a new TOEFL Test.
TOEFL Monograph No. MS-26. Princeton, NJ: Educational Testing Service.
- Cumming, A., Kantor, R., Baba, K., Eouanzoui, K., Erdosy, U., & James, M. 2006.
Analysis of discourse features and verification of scoring levels for independent and integrated prototype written tasks for the new TOEFL.
TOEFL Monograph No. MS-30. Princeton, NJ: Educational Testing Service.
- Cunningham, C. R. 2002.
The TOEIC test and communicative competence: Do test score gains correlate
with increased competence? A preliminary study. University of Birmingham,
UK: MA dissertation.
- Davidson, F. and Fulcher, G. 2007.
Flexibility is proof of a good 'framework'.
Guardian Weekly, 17th November.
- Davidson, F. and Fulcher, G. (2007).
"The Common European Framework of Reference (CEFR) and the design of language tests: A Matter of Effect."
Language Teaching 40, 3, 231 - 241.
- Davies, A. 1984.
Computer Assisted Language Testing.
CALICO Journal 1, 5.
- Davies, A. 1997.
The education (and training) of language testers. Melbourne Papers in Language Testing 6, 1.
- de Jong, H.A.L. 1990.
Standardization in Language Testing. AILA Review 7.
This is the complete text of the edited volume, and contains the following papers:
- Guest-editor's Preface
John H. A. L. DE JONG 3-5
- Language Testing in Research and Education: The Need for Standards
Peter J. M. GROOT 6-23
- The Cambridge-TOEFL Comparability Study : An example of the Cross-National Comparison of Language Tests
Fred DAVIDSON & Lyle BACHMAN 24-45
- The Australian Second Language Proficiency Ratings (ASLPR)
David E. INGRAM 46-61
- Cross-National Standards: A Dutch-Swedish Collaborative Effort in National Standardized Testing
John H.A.L. DE JONG & Mats OSCARSON 62-78
- The Hebrew Speaking Test: An Example of International Cooperation in Test Development and Validation
Elana SHOHAMY & Charles W. STANSFIELD 79-90
- EUROCERT: An International Standard for Certification of Language Proficiency
Alex OLDE KALTER & Paul VOSSEN 91-105
- Response to Alex Olde Kalter and Paul Vossen
John READ 106-107
- Derwing, T. M., Rossiter, M. J., Munroe, M. J. and Thomson, R. I. 2004.
Second Language Fluency: Judgments on Different Tasks. Language Learning, 54, 4, 655 - 679.
- Chalhoub-Deville, M. 2001.
Language Testing and Technology: Past and Future. Language Learning and Technology, 5, 2, 95 - 98.
- Dikli, A. 2006.
An Overview of Automated Scoring of Essays. Journal of Technology, Learning, and Assessment, 5, 1.
- Dooey, P. 1999.
An investigation into the predictive validity of the IELTS Test as an indicator of future academic success
In K. Martin, N. Stanley and N. Davison (Eds), Teaching in the Disciplines/ Learning in Context, 114-118.
Proceedings of the 8th Annual Teaching Learning Forum, The University of Western Australia, Perth.
- Dorans, N. J. 2008.
The practice of comparing scores on different tests. R&D Connections 6. Princeton, NJ: Educational Testing Service.
- Dunkel, P. A. 1997.
Computer-Adaptive Testing of Listening Comprehension: A Blueprint for CAT Development
The Language Teacher Online, 21, 10.
- Dunkel, P. A. 1999.
Considerations in developing or using
second /foreign language proficiency computer-adaptive tests
Language Learning & Technology 2, 2, 77-93
- Dunkin, M. J. 1997.
Assessing Teachers' Effectiveness. Issues in Educational Research, 7(1), 1997, 37-51.
- Dymoke, S. (no date).
Assessing Your Pupils' Poetry. Poetry Class Website Resources.
- Educational Testing Service.
ETS Fairness Review & ETS Standards for Quality and Fairness.
- Elder, C. (1998).
What counts as bias in language testing?
Melbourne Papers in Language Testing 7, 1.
- Embretson, S. 1983.
Construct Validity: Construct Representation Versus Nomothetic Span. Psychological Bulletin, 93, 1, 179 - 197.
- Emmerich, W., Enright, M. K., Rock, D. A. and Tucker, C. 1991.
The Development, Investigation, and Evaluation of New Item Types for the GRE Analytical Measure.
Educational Testing Service, Princeton NJ, ETS Research Report 91-16.
- Ennis, R. H. 1999.
Test Reliability: A Practical Exemplification of
Ordinary Language Philosophy. Philosophy of Education
- Erdosy, M. U. (2004). Exploring Variability in Judging Writing Ability in a Second Language: A Study of Four Experienced Raters of ESL Compositions. TOEFL Research Report 70. Princeton, NJ: Educational Testing Service
- ETS (2010). Linking TOEFL iBT Scores to IELTS Scores - A Research Report. Princeton, NJ: Educational Testing Service.
Read this in relation to:
Score Comparison Tool and,
Supplementary Comparison Tables
- Feast, V. 2002.
The Impact of IELTS scores on performance at university.
International Education Journal, 3, 4, 70 - 85.
- Figueras, N. 1996.
Testing, testing, everywhere, and not a while to think. English Language Teaching Journal 59, 1, 47 - 54.
- Frain, T. J. 2009. A Comparative Study of Korean University Students before and after a Criterion Referenced Test. Unpublished MEd. Thesis, University of Southern Queensland, Australia.
- Frary, R. B. 1996.
Hints for Designing Effective Questionnaires. Practical Assessment, Research and Evaluation, Vol. 11
- Frary, R. B. 1995.
More Multiple Choice Item Writing Do's and Don'ts. ERIC/AE Digest Series EDO-TM-95-4.
- Frary, R. B. 2002.
A Brief Guide to Questionnaire Development.
- Fox, J. and Courchene, R. (2005). "The Canadian Language Benchmarks (CLB): A Critical Appraisal." Contact 31, 2, 7 - 28.
- Fulcher, G. (1987). "Tests of Oral Performance: the need for data-based criteria." English Language Teaching Journal 41, 4, 287 - 291.
- Fulcher, G. (1996). "Invalidating validity claims for the ACTFL Oral Rating Scale." System 24, 2, 163 - 172.
- Fulcher, G. (1996). "Testing tasks: issues in task design and the group oral." Language Testing 13, 1, 23 - 51.
- Fulcher, G. (1996). "Does thick description lead to smart tests? A data-based approach to rating scale construction". Language Testing 13, 2, 208 - 238.
- Fulcher, G. (1997). "An English Language Placement Test: Issues in reliability and validity." Language Testing 14, 2, 113 - 139.
- Fulcher, G. (1998). "Widdowson's model of communicative competence
and the testing of reading: An exploratory study." System 26, 3, 281 - 302.
- Fulcher, G. 1999.
Ethics in Language testing TAE SIG Newsletter - Special Conference Issue, Volume 1, No. 1
- Fulcher, G. (1999). "Assessment in English for Academic Purposes: Putting content validity in its place Applied Linguistics 20, 2, 221 - 236.
- Fulcher, G. (1999). "Computerizing an English language placement test." English Language Teaching Journal 53, 4, 289 - 299.
- Fulcher, G. (2000). "Computers in language testing." In Brett P. and Motteram, G. (Eds.) A special interest in computers: Learning and teaching with information and communications technologies. Manchester: IATEFL publications, 93 - 107. Reprinted with the kind permission of IATEFL.
- Fulcher, G. (2000). "The 'communicative' legacy in language testing." System, 28, 483 - 497.
- Fulcher, G. 2001.
Machines get clever at testing Education Guardian, 17 May.
- Fulcher, G. 2003.
Few ills cured by setting scores Education Guardian, 17 April.
- Fulcher, G. 2003. Interface design in computer-based language testing Language Testing 20, 4, 384 - 408.
- Fulcher, G. 2004.
Are Europe's tests being built on an 'unsafe' framework? Education Guardian, 18 March.
Read the response from Brian North
- Fulcher, G. (2004). "Deluded by artifices? The Common European Framework and harmonization." Language Assessment Quarterly, 1, 4, 253 - 266.
- Fulcher, G. 2008. "Testing Times Ahead?"
Liaison Magazine, Issue 1: July, 20 - 24.
Published by the UK Subject Centre for Languages, Linguistics and Area Studies, University of Southampton.
- Fulcher, G. 2009. Test use and political philosophy.
Annual Review of Applied Linguistics, 29, 3 - 20.
- Fulcher, G. (2011). Cheating gives lie to our test dependence, Guardian Weekly, 11th October 2011. Or you can download a pdf.
- Fulcher, G. and Bamford, R. (1996). "I didn't get the grade I need. Where's my solicitor?" System 24, 4, 437 - 448.
- Fulcher, G. and Davidson, F. (2008).
"Tests in Life and Learning: A Deathly Dialogue."
Educational Philosophy and Theory, 40, 3, 407 - 417.
- Fulcher, G. and Davidson, F. (2009). "Test Architecture, Test Retrofit."
Language Testing 26, 1, 123 - 144.
- Fulcher, G. & Marquez Reiter, R. 2003. Task difficulty in speaking tests Language Testing 20, 3, 321 - 344.
- Gebril, A. and Plakans, L. 2009.
Investigating source use, discourse features, and process in integrated writing tasks.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 7, 47 - 84.
- Geisinger, Kurt F. - Carlson, Janet F. 1995.
Testing Students with Disabilities
- Gibson, E. J., Brewer, P. W. Dholakia, A., Vouk, M. A., and Bitzer, D. L. 1995.
A Comparative Analysis of Web-Based Testing and Evaluation Systems
North Carolina University.
- Gilfert, S. 1996. A Review of TOEIC The Internet TESOL Journal 11, 8.
- Ginther, A. 2001.
Effects of the presence and absence of visuals on performance on TOEFL CBT listening-comprehension stimuli
TOEFL Research Report 66, Princeton, N.J.: Educational Testing Service.
- Glass, G. V. 1978.
Standards and criteria Journal of Educational Measurement 15, 4, 237 - 261.
- Godwin-Jones, B. 2001.
Language Testing Tools and Technology Language Learning & Technology,
Vol. 5, No. 2, May 2001, 8-12
- Goh, C. and Aryadoust, S. V. 2010.
Investigating the Construct Validity of the MELAB Listening Test through the Rasch Analysis and Correlated Uniqueness Modeling.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 8, 31 - 68.
- Gorsuch, G. J. and Cox, T. 2000.
Something Old, Something New, Something Borrowed, Something....: Piloting a Computer Mediated Version of the Michigan Listening Comprehension Test
TESOL EJ 4, 4.
- Grant. S. G. 2000 Teachers and Tests:
Exploring Teachers' Perceptions of
Changes in the New York State Testing Program Education Policy Analysis Archives, 8, 14.
- Godwin-Jones, B. 2001.
Emerging Tools: Language Testing Tools and Technologies.
Language Learning and Technology, Vol 5, No. 2, May 2001, 8 - 12.
- Gorin, J. S. 2007.
Reconsidering Issues in Validity Theory. Educational Researcher 36, 8, 456 - 462.
- Grabowski, K. C. 2007.
Reconsidering the measurement of pragmatic knowledge using a reciprocal written task format. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 7, 1.
- Gruba, P. A. 1999.
The role of digital video media in second language listening comprehension. University of Melbourne: Unpublished PhD thesis.
- Gu, L., Drake, S., and Wolfe, E. W. 2006.
Differential Item Functioning of GRE Mathematics Items Across Computerized and Paper-and-Pencil Testing Media. Journal of Technology, Learning, and Assessment 5, 4.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 1.
JALT Testing and Evaluation SIG Newsletter, 5, 3, 2 - 5.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 2.
JALT Testing and Evaluation SIG Newsletter, 6, 1, 2 - 5.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 3.
JALT Testing and Evaluation SIG Newsletter, 6, 2, 2 - 5.
- Haladyna, T. M. and Downing, S. M. (1989).
A Taxonomy of Multiple-Choice Item-Writing Rules.Applied Measurement in Education, 21(1), 37 - 50.
- Hamilton, L. S., Klein, S. P., and Lorie, W. No Date.
Using Web-Based Testing for Large-Scale Assessment
- Hansen, E. G., Forer, D. C., & Lee, M. J. 2004.
Toward accessible computer-based tests: Prototypes for visual and other disabilities.
TOEFL Research Report RR-78. Princeton, NJ: Educational Testing Service.
- Harding, L. 2008.
Accent and academic listening assessment: A study of test-taker perceptions.
Melbourne Papers in Language Testing 13, 1.
- Harlen, W. H. and Crick, R. D. 2002.
A Systematic Review of the impact of summative assessment and tests on students'
motivation for learning.
London: Institute of Education, Evidence for Policy and Practice Information
and Co-ordinating Centre.
- Hong, W-P, 2008.
Does high-stakes testing increase cultural capital among low-income and racial minority students?.
Educational Policy Analysis Archives, 16, 6.
- Hguyen, T. N. H. 2007.
Effects of test preparation on test performance - the case of the IELTS and TOEFL iBT Listening Tests.
Melbourne Papers in Language Testing 12, 1.
- Huitt, B., Hummel, J. and Kaeck, D. 1995. Assessment, Measurement, Evaluation and Research Valdosta State University
- Hutchison, D. and Benton, T. 2009.
Parallel Universes and Parrallel Measures: Estimating the Reliability of Test Results.
London: OFQUAL and the National Foundation for Educational Research.
- Jacobsen, M., Kremer, R., and Flores, R. 1999
WebCT in Computer Science New Currents in Teaching and Learning, 6, 3.
- Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., Taylor, C. 2000
TOEFL 2000 Framework: A Working Paper
Educational Testing Service, Princeton NJ.
- Jia, Y., and Zhang, W. 2007
Evaluating the construct validity of an EFL test for PhD candidates: A quantitative analysis of two versions
Shiken, 11, 1, 2 - 16.
- Joint Committee on Testing Practices. 2004.
Code of Fair Testing Practices in Education.
American Psychological Association.
- Kane, M. 2001.
Current Concerns in Validity Theory.
Journal of Educational Measurement, 38, 4, 319 - 342.
- Kane, M. 2010.
Errors of Measurement, Theory, and Public Policy.
12th Annual William H. Angoff Memorial Lecture. Princeton, NJ: Educational Testing Service.
- Kang, O. 2008.
Ratings of L2 oral performance in English: Relative impact of rater characteristics and accoustic measures of accendtedness.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 6, 181 - 205.
- Karavas, E., and Delieza, X. 2009.
On-site observation of KPG oral examiners: Implications for oral examiner training and evaluation.
Journal of Applied Language Studies 3, 1, 51 - 77.
- Kehoe, J. 1995.
Basic Item Analysis for Multiple-Choice Tests.
- Kehoe, J. 1995.
Writing Multiple Choice Test Items.
- Kenworthy, R. 2006.
Timed versus At-home Assessment Tests: Does Time Affect the Quality of Second Language Learners' Written Compositions?
TESOL-EJ 10, 1.
- Kenyon, D. M. and Malabonga, V. 2001.
Comparing examinee attitudes toward computer-assisted and otheroral proficiency assessments.
Language Learning and Technology, Vol 5, No. 2, May 2001, 60 - 83.
- Kim, H. J. and Shin, H. W. 2006.
A reading and writing placement test: Design, evaluation, and analyais. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 6, 2.
- Kirsch, I., Jamieson, J., Taylor, C., and Eignor, D. 1998.
Computer Familiarity Among TOEFL Examinees
TOEFL Research Report 59, Educational Testing Service,
- Kitao, S. K. and Kitao, K. 1996. Testing
Communicative Competence Internet TESOL Journal, 2, 5.
- Kitao, S. K. and Kitao, K. 1996.
Testing Grammar Internet TESOL Journal, 2, 6.
- Kluitmann, S. (2008).
Testing English as a Foreign Language. Two EFL-Tests used in Germany. Philologische Fakultat, Albert-Ludwigs-Universitat Freiburg.
- Kirkpatrick, R. (2011).
The Negative Backwash of Exam-Oriented Education on Chinese High School Students Language Testing in Asia 1(3), 55 - 71.
- Kitao, S. K. and Kitao, K. 1996.
Testing Listening Internet TESOL Journal, 2, 7.
- Knoch, U. 2008.
Collaborating with ESP Stakeholders in Rating Scale Validation: The case of the ICAO Rating Scale.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 7, 21 - 46.
- Knoch, U. 2009.
The assessment of academic style in EAP writing: The case of the rating scale.
Melbourne Papers in Language Testing 13, 1.
- Koizumi, R. 2006.
Relationships Between Productive Vocabulary Knowledge and Speaking Performance of Japanese Learners of English at the Novice Level. Unpublished PhD thesis, University of Tsukuba, Japan.
- Koretz, D., Russell, M., Shin, C. D., Horn, C. and Shasby, K. 2002.
Testing and diversity in postsecondary
education: The case of California Education Policy Analysis Archives, 10, 1.
- Kunnan, A. J. 1998.
An introduction to structural equation modelling for language assessment research. Language Testing 15, 3, 295 - 332.
- Kunnan, A. J. 2000.
Fairness and Justice for All. In A. J. Kunnan (Ed.) Fairness and Validation in Language Assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida. Studies in Language Testing 9. pp. 1 – 14. Cambridge: Cambridge University Press.
- Kunnan, A. J. 2004.
Test Fairness. In M. Milanovic and C. Weir (Eds.) European Language Testing in a Global Context: proceedings of the ALTE Barcelona conference. pp. 27 - 48. Cambridge: Cambridge University Press.
- Kunnan, A. J. 2005.
Language assessment from a wider context. In Hinkel, E. (Ed.) Handbook of research in second language teaching and learning, 779 - 794.
- Kyllonen, P. C. 2005.
The case for noncognitive assessments. R&D Connections 3. Princeton, NJ: Educational Testing Service.
- Laborda, J. G. 2007.
From Fulcher to PLEVALEX: Issues in Interface design, validity and reliability in Internet based Language Testing CALL-EJ Online 9, 1.
- Laborda, J. G. 2007.
On the Net: Introducing Standardized EFL/ESL Exams Language Learning & Technology 11, 2, 3 - 9.
- Laborda, J. G. 2012.
Preliminary Findings of the PAULEX Project: A Proposal for the Internet-based Velencian University Entrance Examination Journal of Language Teaching and Research 3, 2, 250 - 255.
- Lane, S. 1999.
Validity Evidence for Assessments Reidy Interactive Lecture Series.
- Lazaraton, A. and Wagner, S. (1996).
The Revised TSE test: Discourse Analysis of Native Speaker and Nonnative Speaker Data Research Report 96-10. Princeton NJ: Educational Testing Service.
- Lee, Y-W. 2005.
Dependability of Scores for a New ESL Speaking Test: Evaluating Prototype Tasks.. TOEFL Monograph Series MS-28. Princeton, NJ: Educational Testing Service.
- Lee, Y.-W., Breland, H., & Muraki, E. 2004.
Comparability of TOEFL CBT writing prompts for different native language groups.
TOEFL Research Report RR-77. Princeton, NJ: Educational Testing Service.
- Lewkowicz, J. A. 2000.
Authenticity in language testing: some outstanding questions. Language Testing 17, 1, 43 - 64.
- Liao, C-W, Hatrak, N. and Yu, G. (2010)
Comparison of Content, Item Statistics and Test-Taker Performance for the Redesigned and Classic TOEIC Listening and Reading Tests. Princeton NJ: Educational Testing Service
- Lightsone, K and Smith, S. M. 2009.
Student Choice between Computer and Traditional Paper-and-Pencil University Tests: What Predicts Preference and Performance?
Revue internationale des technologies en pedagogie universitaire / International Journal of Technologies in Higher Education, vol. 6, 1, 2009, p. 30-45.
- Lim, G. S. 2009.
Prompt and Rater Effects in Second Language Writing Performance Assessment. University of Michigan: Unpublished PhD Thesis.
- Lim, G. S. 2010.
Investigating Prompt Effects in Writing Performance Assessment. Spaan Fellow Working Ppaers in Second or Foreign language Assessment 8, 95 - 116.
- Linn, R. L. 2003. Performance Standards: Utilitily for Different Uses of Assessments. Education Policy Analysis Archives Volume 11 Number 31.
- Linn, R. L. 2010. Comments on Atkinson and Geiser: Considerations for Colleage Admissions Tests. Educational Researcher 38, 9, 677 - 679.
- Linn, R. L., Baker, E. L. and Dunbar, S. B. 1991.
Complex, Performance-Based Assessment: Expectations and Validation Criteria. CSE Technical Report 331.
- Livingstone, S. A. 2009.
Constructed-response test questions: Why we use them; how we score them. R&D Connections 11. Princeton, NJ: Educational Testing Service.
- Liu, O L. 2009.
Measuring learning outcomes in higher education. R&D Connections 10. Princeton, NJ: Educational Testing Service.
- Loevinger, J. 1957.
Objective tests as instruments of psychological theory. Psychological Reports 3, 635 - 694. Southern Universities Press, Monograph Supplement 9.
- Loulou, D. 1995.
Making the A: How To Study for Tests.
ERIC/AE Digest Series EDO-TM-95-10
- Low, G. No date.
Communicative Testing as an Optimistic Activity.
Manuscript from the Language Centre, University of Hong Kong.
- Lynch, B. K. and Davidson, F. 1994.
Criterion-Referenced Language Test Development: Linking Curricula, Teachers and Tests.
TESOL Quarterly 28, 4, 727 - 743.
- Malone, M. 2000.
Simulated Oral Proficiency Interviews: Recent Developments. ERIC Digest.
- May, L. 2006.
An examination of rater orientations on a paired candidate discussion task through stimulated verbal recall.
Melbourne Papers in Language Testing 11, 1.
- May, L. 2010.
Developing speaking assessment tasks to reflect the 'social turn' in language testing. University of Sydney Papers in TESOL 5, 1 - 30.
- McAulay, A. 2002.
Peer and Self-evaluation in Spoken Tests: Tools and Methods Internet TESOL Journal, September.
- McClellan, C. 2010.
Constructed-Response Scoring - Doing it Right R&D Connections 13. Princeton, NJ: Educational Testing Service.
- McLean, L., Myers, M., Smillie, C., and Vaillancourt, D. 1997.
Qualitative Research Methods: An essay review
Education Policy Analysis Archives, 5, 13.
- McNamara, T. (1996). Measuring Second Language Performance. Harlow: Longman/Pearson Education.
- Second language performance assessment
- Modelling performance: opening Pandora's Box
- Designing a performance test: The Occupational English Test
- Raters and ratings: introduction to multi-faceted measurement
- Concepts and procedures in Rasch measurement
- Mapping and reporting abilities and skill levels
- Using Rasch analysis in research on second language performance assessment
- Data, models and dimensions
- McNamara, T. (1997). "Problematising content validity: the Occupational English Test (OET) as a measure of medical communication." Melbourne Papers in Language Testing 6(1) 19 - 43.
- Mehrens, A. A. No Date.
Preparing Students to Take Standardized Achievement Tests
- Messerklinger, J. 1997.
Evaluating Oral Ability The Language Teacher Online, 21, 11.
- Messick, S. (1988). Consequences of Test Interpretation and Use: The Fusion of Validity and Values in Psychological Assessment Research Report 48, Princeton NJ: Educational Testing Service.
- Mills, A., Swain, L. and Weschler, R. 1996.
The Implementation of a First Year English Placement System Internet TESOL Journal, 2, 11.
- Milton, J. 2006.
French as a Foreign Language and the Common European Framework of Reference for Languages.
Proceedngs from the Crossing Frontiers: Languages and International Dimension
conference, Cardiff University, 6 - 7 July.
- Mislevy, R. J., 1992.
Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. Princeton, NJ: Educational Testing Service.
- Mislevy, R. J., Behrens, J. T., Bennett, R. E., Demark, S. F., Frezzo, D. C., Levy, R., Robinson, D. H., Rutstein, D. W., Shute, V. J., Stanley, K. & Fielding, I. W. 2010.
On the roles of external knowledge representations in assessment design. Journal of Technology, Learning, and Assessment 8, 2.
- Mislevy, R. J., Chapelle, C., Chung, Y-R. and Xu, J. 2008.
Options for Adaptivity in Computer-Assisted Language Learning and Assessment. In Chapelle, C. A., Chung, Y-R., and Xu, J. (Eds.) Towards adaptive CALL: Natural language processing for diagnostic language assessment Ames, IA: Iowa State University, 9 - 24.
- Mislevy, R. J., Steinberg, L. S., and Almond, R. G. (2002).
Design and analysis in task-based language assessment. Language Testing 19, 4, 477 - 496.
- Mislevy, R. J. & Yin, C. 2009.
If Language is a Complex Adaptive System, What is Language Assessment? Paper presented at Language as a Complex Adaptive System conference at the University of Michigan, Ann Arbor, 7 - 9th November, 2008.
- Monaghan, W. 2006.
The facts about subscores. R&D Connections 4. Princeton, NJ: Educational Testing Service.
- Monaghan, W. and Bridgeman, B. 2005.
E-rater as a quality control on human scores. R&D Connections 2. Princeton, NJ: Educational Testing Service.
- Moritoshi, P. 2001.
The Test of English for International Communication (TOEIC): necessity, proficiency levels,
test score utilization and accuracy. University of Birmingham, UK: MA assignment.
- Moritoshi, P. 2002.
Validation of the Test of English Conversation Proficiency.
University of Birmingham: MA dissertation.
- Moodie, I. 2008.
Using Pair Work Exams for Testing in the ESL/EFL Conversation Classes.
Internet TESL Journal XIV, 8.
- Mueller, J. 2003.
Authentic Assessment Toolbox. North Central College, Naperville, IL.
- Mousavi, S. A. 2007.
Computer Package for the Assessment of Oral Proficiency of Adult ESL Learners: Implications for Score Comparability. Griffith University: Unpublished PhD Thesis.
- Newfields, T. 2005.
TOEIC Washback Effects on Teachers: A Pilot Study at One University Faculty
Educational Policy Archives, 14, 1.
- Newfields, T. 2006.
Teacher development and assessment literacy
Authentic Communication: Proceedings of teh 5th Annual JALT Pan-SIG Conference Shizuoka, Japan: Tokai University College of Marine Sciences, 48 - 73.
- Nichols, S. L. and Glass, G. V. 2006.
High-Stakes Tesing and Student Achievement: Does Accountability Pressure Increase Student Learning?
Toyo University Keizai Ronshu, 31, 1, 83 - 106
- Norris, J. M. 2004.
Assessment in College Foreign Language Programs.
Paper delivered at teh Association of Departments of Foreign Languages Summer Seminar West, Albuquerque, New Mexico, 11th June.
- North, B. 2004.
'Europe's framework promotes language discussion, not directives'. Education Guardian, 15 April.
A reply to Glenn Fulcher
- Norris, J. M. 2001.
Concerns with computerized adaptive oral proficiency assessment.
Language Learning and Technology, Vol 5, No. 2, May 2001, 99 - 105.
- Ohkubo, N. 2009.
Validating the integrated writing task of the TOEFL internet-based test (iBT): Linguistic Analysis of test takers' use of input material.
Melbourne Papers in Language Testing 14, 1.
- O'Loughlin, K. 2006.
Learning about second language assessment: Insights from a postgraduate student on-line subject forum.
University of Sydney Papers in TESOL 1, 71 - 85
- O'Loughlin, K. 2009.
Does it measure up? Benchmarking the written examination of a university English pathway program.
Melbourne Papers in Language Testing 14, 1.
- O'Neil, H. F. and Schacter, J. 1997.
Test Specifications for Problem-Solving Assessment.
CRESST/University of California, Los Angeles: CSE Technical Report 463.
- O'Sullivan, B. 2007.
Testing Speaking in Larger Classes
Humanising Language Teaching 9, 4.
- O'Sullivan, B., Weir, C. J., and Saville, N.
Using observation checklists to validate speaking-test tasks. Language Testing 19, 1, 33 - 56.
- Papageorgiou, S. (2007). Relating the Trinity College London GESE and ISE exams to the Common European Framework of Reference: Piloting of the Council of Europe draft Manual. London: Trinity College London.
- Papageorgiou, S. (2010).
Setting cut scores on the Common European Framework of Reference for the Michigan English Test. Ann Arbor: University of Michigan.
- Papajohn, D. 2006.
Standard setting for next generation TOEFL Academic Speaking Test (TAST): Reflections on the ETS Panel of International Teaching Assistant Developers
TESOL-EJ 10, 1.
- Park, T. 2004.
An investigation of an ESL placement test of writing using Many-facet Rasch Measurement
Teachers College, Columbia University Papers in TESOL and Applied Linguistics 4, 1.
- Peirce, B. N., and Stewart, G. 1997.
The Development of the Canadian Language Benchmarks Assessment TESL Canada Journal 14, 2, 17 - 31.
- Penfield, R. D. (2010).
Test-based grade retention: Does it stand up to pfoessional standards for fair and appropriate test use? Educational Researcher, 39, 2, 110 - 119.
- Perea, L. (2010).
Benefits of Teachers' Feedback to Reverse-Engineering Item Language Test Specifications from an Existing Item Bank. Texas Papers in Foreign Language Education 15(1), 30 - 54.
- Phakiti, A. 2006.
Modeling cognitive and metacognitive strategies and their relationship to EFL reading test performance.
Melbourne Papers in Language Testing 11, 1.
- Poole, G. 2003.
Assessing Japan's Institutional Entrance Requirements.
Asian EFL Journal 5, 1.
- Poonpon, K. 2010.
Expanding a Second Language Speaking Rating Scale for Instructional and Assessment Purposes.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 8, 69 - 94.
- Popham, J. W. 2012.
Assessment Bias: How to Banish It. Boston MA: Pearson Education. .
- Powers, D. E. 2010.
The case for a comprehensive, four-skills assessment of English-language proficiency R&D Connections 14. Princeton, NJ: Educational Testing Service.
- Praphal, K. 1990.
The relevance of language testing research in the planning of language programmes.
Thailand: Chulalongkorn University.
- Ranali, J. M. 2002.
Comparing scoring procedures on a cloze test.
University of Birmingham, UK: MA assignment.
- Read, J. 2004.
Second Language Vocabulary Testing: Taking a Broader Perspective. Paper delivered at the International Conference on English Instruction and Assessment.
- Read, J. 2007.
Second language vocabulary assessment: Current practices and new directions. Journal of English Studies, 7, 2, 105 - 125.
- Robb, T. N. & Ercanbrack, J. 1999.
A Study of the Effect of Direct Test Preparation on
the TOEIC Scores of Japanese University Students
TESOL-EJ, 3, 4.
- Roever, C. 2001.
Web based language testing.
Language Learning and Technology, Vol 5, No. 2, May 2001, 84 - 94.
- Roever, C. and Powers, D. E.. 2005.
Effects of language administration on a self-assessment of language skills.
TOEFL Monograph No. MS-27. Princeton, NJ: Educational Testing Service.
- Rosenfeld, M., Leung, S., & Oltman, P. K. . 2001.
The reading, writing, speaking, and listening tasks important for academic success at the undergraduate and graduate levels.
TOEFL Monograph No. MS-21. Princeton, NJ: Educational Testing Service.
- Rosenshine, B. 2003.
High Stakes Testing: Another analysis.
Education Policy Analysis Archives
Volume 11 Number 24
- Ross, J. A. 2006.
The Reliability, Validity, and Utility of Self-Assessment.
Practical Assessment, Research and Evaluation
Volume 11 Number 10
- Rudner, L. 1994.
Questions to ask when evaluating tests.
ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 1998.
An Online, Interactive, Computer Adaptive Test Tutorial.
ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 2001.
Reliability. ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 2006.
An evaluation of IntelliMetric Essay Scoring System. Journal of Technology, Learning, and Assessment 4, 4.
- Russell, M.1999. Testing On Computers:
A Follow-up Study Comparing Performance On
Computer and On Paper Education Policy Analysis Archives, 7, 20.
- Russell, M. and Haney, W. 1997.
Testing Writing on Computers: An Experiment Comparing Student Performance on Tests Conducted
via Computer and via Paper-and-Pencil Education Policy Analysis Archives, 5, 3.
- Russell, M. and Haney, W. 2000.
Bridging the Gap between Testing and Technology in Schools.
Education Policy Analysis Archives, 8, 19.
- Sanders, W. and Horn, S. P. 1995. Educational Assessment
Reassessed: The Usefulness of Standardized and Alternative Measures of Student
Achievement as Indicators for the Assessment of Educational Outcomes Education Policy Archives, 3, 6.
- Sarle, Warren S. 1995. Measurement theory:
Frequently asked questions From the Disseminations of the International Statistical Applications Institute, 4th edition, Wichita: ACG Press, 61-66.
Also available at: ftp://ftp.sas.com/pub/neural/measurement.html
- Sasaki, M., and Hirose, K. 1996.
Explanatory Variables for EFL Students' Expository Writing. Language Learning 46, 1, 137 - 174.
- Sawaki, Y. 2001.
Comparability of Conventional and Computerized Tests of Reading in a Second Language. Language Learning and Technology
Vol. 5, No. 2, May 2001, pp. 38-59 .
- Sawaki, Y. and Nissan, S. 2009.
Criterion-related validity of the TOEFLiBT Listening Section. TOEFL Research Report 09-02. Princeton, NJ: Educational Testing Service.
- Scharber, C., Dexter, A. and Riedel, E. 2008.
Students' Experiences with an Automated Essay Scorer. The Journal of Technology, Learning, and Assessment.
- Shaw, S. and Falvey, P. 2008.
The IELTS Writing Assessment Revision Project: Towars a revised rating scale. Cambridge: University of Cambridge ESOL Examinations Research Report 1.
- Sheehan, K. M., Kostin, I., Futagi, Y & Flor, M. 2010.
Generating Automated Text Complexity Classifications that are Aligned with Targeted Text Complexity Standards. ETS Research Report 10-28. Princeton NJ: Educational Testing Service.
- Shohamy, E. 2007.
Language Tests as Language Policy Tools. Assessment in Education 14, 1, 117 - 130.
- Sireci, S. G. 2007.
On Validity Theory. Educational Researcher 36, 8, 477 - 481.
- Skehan, P. 1990.
Communicative Language Testing. Journal of TESOL France 10, 1, 115 - 127.
- Sokolik, M. and Duber, J. 2002.
Grow Your Own: Online Placement Testing TESL-EJ, 6, 1.
- Stansfield, C. W. 1992. ACTFL Speaking Proficiency Guidelines Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Stansfield, C. W. 1996. Content Assessment in the Native Language Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Stansfield, C. W. & Kenyon, D. 1996. Simulated Oral Proficiency Interviews: An Update Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- State of Illinois. 1995.
Assessment Handbook. A Guide for Developing Assessment Programs in Illinois Schools
Springfield, IL: Illinois State Board of Education.
- Stricker, L. J. 2002.
The Performance of Native Speakers of English and ESL Speakers on the Computer-Based TOEFL
and the GRE General Test.
Princeton NJ: Educational Testing Service, TOEFL Research Report 69.
- Swain, M., Huang, L-S, Barkaoui, K., Brooks, L., and Lapkin, S. 2009.
The Speaking Section of the TOEFL iBT: Test-takers' Reported Strategic Behaviors.
Princeton NJ: Educational Testing Service, TOEFLiBT Research Report 09-30.
- Tannenbaum, J. 1996. Practical Ideas On Alternative Assessment For ESL Students Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Tannenbaum, R. J. and Wylie, E. C. 2008. Linking English language test scores onto the Common European Framework of Reference: An application of standard setting methodology. TOEFL iBT Report iBT-06. Princeton, N.J: Educational Testing Service.
- Tasdemir, M., Tasdemir, A., and Yildirim, K. (2009)
Influence of Portfolio Evaluation in Cooperative Learning on Student Success.
Journal of Theory and Practice in Education, 5, 1, 53 - 66.
- Taylor, C. S. and Nolan, S. B. 1996.
What does the psychometrician's classroom look like? Reframing assessment concepts in the context of learning.
Educational Policy Archives, 14, 7.
- Taylor, C., Jamieson, J., Eignor, D., & Kirsch, I. 1998.
The relationship between computer familiarity and performance on computer-based TOEFL test tasks.
TOEFL Research Report RR-61. Princeton, NJ: Educational Testing Service.
- Taylor, L. (2009).
. Developing Assessment Literacy. Annual Review of Applied Linguistics 29, 21 - 36.
- Templer, B. 2004. High-Stakes Testing at High Fees: Notes and Queries on the International English Proficiency Assessment Market.
Journal for Critical Education Policy Studies, 2, 1.
- Thompson, G. 2009.
Reevaluating the Test Specifications for an Oral Proficiency Test? The Journal of Kanda University of International Studies 21, 233 - 260.
- Tsagari, D. and Papageorgiou, S. (2012).
Language testing and assessment: Issues in the Greek educational context. Hellenic Open University: Research Papers in Language Teaching and Learning 3(1).
- Tsang, S. L., Katz, A. and Stack, J. 2008.
Achieving Testing for English Language Learners, Ready or Not?
Educational Policy Archives, 16, 1.
- Tuzi, F. 1997. Using Microsoft Word to Generate Computerized Tests Internet TESOL Journal, 3, 11.
- Wagner, A. No Date.
Don't Messick around with Test Validity until you know what you're doing.
- Wagner, E. 2002.
Video listening tests: A pilot study.
Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 2, 1.
- Wagner, E. 2007.
Are They Watching? Test-Taker Viewing Behavior During an L2 Video Listening Test.
Language Learning and Technology, 11, 1.
- Walker, M. E. 2007.
Is test score reliability necessary? R&D Connections 5. Princeton, NJ: Educational Testing Service.
- Wall, D., & Horak, T. 2006.
The impact of changes in the TOEFL examination on teaching and learning in central and eastern Europe. Phase I: The baseline study .
TOEFL Monograph No. MS-34. Princeton, NJ: Educational Testing Service.
- Wall, D., & Horak, T. 2008.
The impact of changes in the TOEFL examination on teaching and learning in central and eastern Europe. Phase 2: Coping with change .
TOEFL iBT Report No. iBT-05. Princeton, NJ: Educational Testing Service.
- Wang, J., and Brown, M. S. 2007.
Automated Essay Scoring Versus Human Scoring: A Comparative Study. Journal of Technology, Learning, and Assessment, 6, 2.
- Weideman, A. 2006.
Assessing Academic Literacy in a Task-Based Approach.
Language Matters 37, 1, 81 - 101.
- Wendler, C. and Powers, D. 2009.
What does it mean to repurpose a test? R&D Connections 9. Princeton, NJ: Educational Testing Service.
- Wilson, N. 1998.
Educational Standards and the Problem of Error.
Educational Policy Archives, 6, 10.
- Wolfe, E. W., Matthews, S., and Vickers, D. 2010.
The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment 10, 1.
- Wolfe, E. W. and Manalo, J. R. 2004.
Composition Medium Comparability in a Direct Assessment of Non-native English Speakers.
Language Learning and Technology, 8, 1, 52 - 65.
- Wright, P. W. D. and Wright, P. D. 2004.
Understanding Tests and Measurements for the Parent and Advocate.
- Wylie, E.
An overview of the International Second Language Proficiency Ratings (ISLPR).
Australia: Griffith University Centre for Applied Linguistics and Languages.
- Yen, D. A. and Kuzma, J. No date.
Higher IELTS score, higher academic performance? The validity of IELTS in predicting the academic performance of Chinese students. Mimeo: University of Worcester.
- Yerkes, R. M. (1921).
Psychological Examining in the United States Army. Memoirs of the National Academy of Science, Volume 15.
- Yoff, L. 1997. 'An overview of ACTFL proficiency interviews. A test of speaking ability.' JALT Testing and Evaluation SIG Newsletter,
1, 2, 3 - 9.
- Young, J. W. 2008.
Ensuring valid test content tests for English language learners. R&D Connections 8. Princeton, NJ: Educational Testing Service.
- Young, V. M. and Kim. D. H. 2010.
Using Assessments for Instructional Improvement: A Literature Review. Educational Policy Analysis Archives 18, 19.
- Young, J. W. and King T. C. 2008. 'Testing Accommodations for English Language Learners: A Review of State and Disctrict Policies. New York: College Board.
- Yu, E. 2006. A Comparative Study of the Effects of a Computerized English Oral Proficiency Test Format and a Conventional Speak Test Format. Unpublished PhD Thesis: Ohio State University.
- Zechner, K. and Xi, X. 2008.
Towards automatic scoring of a test of spoken language with heterogeneous task types. Proceedings of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications Association for Computational Linguistics, Columbus Ohio, 98 - 106.
- Zimmerman, D. W. and Zumbo, B. D. 2009.
Hazards in choosing between pooled and separate variances t tests. Psicologica 30, 371 - 390.
- Zumbo, B. D. 2009.
Validity as Contextualized and Pragmatic
Explanation, and Its Implications for Validation Practice. In Robert
W. Lissitz (Ed.) The Concept of Validity: Revisions, New Directions
and Applications, (pp. 65-82). IAP - Information Age Publishing,
Inc.: Charlotte, NC.