| |
|
|
| |
A
- Amrein, A. L. and Berliner, D. C. 2002.
High Stakes Testing, Uncertainty, and Student Learning.
Education Policy Analysis Archives, 10, 18.
- Anonymous Evaluation and Assessment Primer
Vanderbilt University

- Anonymous, 2009. Computer-based and paper-pencil test comparability.
Pearson Education: Test, Measurement and Research Services Bulletin 9

- Assessment Reform Group. (1999). Beyond the Black Box.

- Assessment Reform Group. (2002). Testing, Motivation and Learning. Cambridge: University of Cambridge Faculty of Education.

- Atkinson, T and Davies, G. 2000.
Computer Aided Assessment and Language Learning. ICLT4LT.
- Au, W. 2007.
High Stakes Testing and Curricular Control.
Educational Researcher, 36, 5.
B
- Bachman, L. 2005.
"Building and supporting a case for test use".
Language Assessment Quarterly 2, 1, 1-34.

- Bae, J. 2001.
Cohesion and coherence in children's written English:
Immersion and English-only classes. Issues in Applied Linguistics (ial),
12, 51-88. Reprinted online with the kind permission of ial.
- Bailey, K. M. 1999. Washback in Language Testing. Research Monograph 99-4, Princeton N.J.: Educational Testing Service
- Barton, P. E. 2009. National Education Standards. Getting beneath the surface. Policy Information Center, Princeton N.J.: Educational Testing Service
- Bayliss, A. and Ingram, D. E. 2006. IELTS as a Predictor of Academic Language Performance. Paper delivered at the Australian International Education Conference.
- Bejar, I. I. 2008.
Standard setting: What is it? Why is it important? R&D Connections 7. Princeton, NJ: Educational Testing Service.
- Bennett, R. E. 2001.
"How the Internet will help large-scale assessment reinvent itself."
Educational Policy Analysis Archives, 9, 5.
- Ben-Simon, A., and Bennett, R. E. 2007.
Toward More Substantively Meaningful Automated Essay Scoring. Journal of Technology, Learning, and Assessment, 6, 1.
- Bennett, R. E. and Ben-Simon, A. 2005.
Toward Theoretically Meaningful Automated Essay Scoring.
National Center for Education Statistics, Institute for Education Sciences, US Department of Education.

- Bialy, J. M. 2003.
"IELTS Speaking Test Preparation in the People's Republic of China: Communicative Approaches and Rote-Memorization Compared."
Unpublished MA Dissertation, University of Surrey.

- Biber, D., Conrad, S. M., Reppen, R., Byrd, P., Helt, M., Clark, V., et al. 2004.
Representing language use in the university: Analysis of the TOEFL 2000 spoken and written academic language corpus.
TOEFL Monograph No. MS-25. Princeton, NJ: Educational Testing Service.
- Black, P. and Wiliam, D. 1998.
"Inside the Black Box: Raising Standards through Classroom Assessment".
Phi Delta Kappa International.

- Blake, R., Wilson, N. L., Cetto, M., Pardo-Ballester, C. 2008.
"Measuring oral proficiency in distance, face-to-face, and blended classrooms".
Language Learning and Technology 12, 3, 114 - 127.

- Bodmann, S. M and Robinson, D. H. 2004.
"Speed and performance differences among computer based and paper -pencil tests".
Journal of Educational Computing Research 31, 1, 51 - 60.

- Body, N. 2001.
The revision of the IELTS speaking test.
JALT Testing and Evaluation Newsletter, 5, 2, 2 - 4.
- Bolon, C. 2000.
School-based Standard Testing
Education Policy Analysis Archives
Volume 8 Number 23
- Bond, L. A. 1996.
Norm- and Criterion-referenced testing.
ERIC Clearinghouse on Assessment and Evaluation Washington DC.
- Borsboom, D., Mellenbergh, G. J., and van Heerden, J. 2004.
The Concept of Validity. Psychological Review 111, 4, 1061 - 1071.
- Braun, H. 2004.
Reconsidering the Impact of High Stakes Testing.
Education Policy Analysis Archives
Volume 12 Number 1
- Breland, H., Lee, Y.-W., Najarian, M., & Muraki, E . 2004.
An analysis of TOEFL-CBT writing prompt difficulty and comparability for different gender groups
.
TOEFL Research Report RR-76. Princeton, NJ: Educational Testing Service.
- Bridgeman, B., McBride, A., and Monaghan, W. 2004.
Testing and time limits. R&D Connections 1. Princeton, NJ: Educational Testing Service.
- Brindley, G. 1997.
Assessment and the Language Teacher: Trends and Transitions
Language Teacher Online, 21, 9.
- Brown, J. D. 1997.
Skewness and Kurtosis
Shiken 1, 1, 18 - 20.
- Brown, J. D. 1997.
Computers in Language Testing: Present research and some future directions
Language Learning & Technology 1, 1, 44-59.
- Brown, J. D. 2000.
How can we calculate item statistics for weighted items?
JALT Testing and Evaluation SIG Newsletter, 3, 2, 19 - 21)
- Brown, J. D. 2000.
What is Construct Validity?
JALT Testing and Evaluation SIG Newsletter, 4, 2, 7 - 10)
- Brown, J. D. 2002.
Distractor Efficiency Analysis on a Spreadsheet.
JALT Testing and Evaluation SIG Newsletter, 6, 3, 20 - 23)
- Brown, J. D. 2003.
Norm-referenced item analysis (item facility and item discrimination.
JALT Testing and Evaluation SIG Newsletter, 7, 2, 16 - 19)
- Brown, J. D. 2004.
Performance Assessment: Existing Literature and Directions for Research.
Second Langauge Studies, 22, 2, 91 - 139).

- Brualdi, A. 1999.
Traditional and Modern Concepts of Validity
ERIC Clearinghouse on Assessment and Evaluation Washington DC.
- Byram, M. 2000. Assessing Intercultural Competence in Language Teaching
SprogForum 16, 8, 8 - 13.
C
- Camilli, G. 1996.
Standard Errors in Educational Assessment: A Policy Analysis Perspective
Education Policy Analysis Archives 4, 4.
- Carrell, P. L. 2007.
Notetaking strategies and their relationship to performance on listening comprehension and communicative assessment tasks.
TOEFL Monograph No. MS-35. Princeton, NJ: Educational Testing Service.
- Carrell, P. L. , Dunkel, P. A. and Mollaun, P. 2002.
The effects of notetaking, lecture length, and topic on the listening component of TOEFL 2000.
TOEFL Monograph No. MS-23. Princeton, NJ: Educational Testing Service.
- Celik, M. 1999.
Testing Some Suprasegmental Features of English Speech The Internet TESL Journal, 5, 8.
- Chalhoub-Deville, M. 2001.
Language Testing and Technology: Past and Future
Language Learning and Technology, Vol 5, No. 2, May 2001, 95 - 98.
- Chalhoub-Deville, M. and Fulcher, G. 2003.
The Oral Proficiency Interview: A Research Agenda
Foreign Language Annals, 36, 4, 498 - 506.
- Chapman, M. 2003.
TOEIC: Tried but Undertested.
JALT Testing and Evaluation SIG Newsletter, 7, 3, 2 - 5.
- Cimbricz, S. 2002.
State-mandated testing and teachers' beliefs and practice.
Education Policy Analysis Archives 10, 2.
- Cohen, A. D., & Upton, T. A. 2006.
Strategies in responding to new TOEFL reading tasks.
TOEFL Monograph No. MS-33. Princeton, NJ: Educational Testing Service.
- Comber, J. 1998. Are Test Preparation Programs Really Effective? Evaluating an IELTS Preparation Course?
Unpublished MA dissertation, University of Surrey.

- Commitee on Assessment and Evaluation in Education. 2005.
The Knowledge Base for Assessment and Evaluation in Education.
Israel Academy of Sciences and Humanities; Ministry of Education, Culture and Sport;
Rochschild Foundation (Yad Hanadiv).

- Coniam, D. and Falvey, P. 1999.
Assessor training in a high-stakes test of speaking: The Hong Kong English language benchmarking initiative.
Melbourne Papers in Language Testing 8, 2.
- Coombe, C. 2002.
Self-assessment in language testing: Reliability and validity issues.
Karen's Linguistics Issues.
- Cronbach, L. J. and Meehl, P. E. 1955.
Construct Validity in Psychological Tests
Psychological Bulletin, 52, 281 - 302.
- Cumming, A., Grant, L., Mulcahy-Ernt, P., & Powers, D. E. 2005.
A teacher-verification study of speaking and writing prototype tasks for a new TOEFL Test.
TOEFL Monograph No. MS-26. Princeton, NJ: Educational Testing Service.
- Cumming, A., Kantor, R., Baba, K., Eouanzoui, K., Erdosy, U., & James, M. 2006.
Analysis of discourse features and verification of scoring levels for independent and integrated prototype written tasks for the new TOEFL.
TOEFL Monograph No. MS-30. Princeton, NJ: Educational Testing Service.
- Cunningham, C. R. 2002.
The TOEIC test and communicative competence: Do test score gains correlate
with increased competence? A preliminary study. University of Birmingham,
UK: MA dissertation.
D
- Davidson, F. and Fulcher, G. 2007.
Flexibility is proof of a good 'framework'.
Guardian Weekly, 17th November.
- Davies, A. 1984.
Computer Assisted Language Testing.
CALICO Journal 1, 5.
- Davies, A. 1997.
The education (and training) of language testers. Melbourne Papers in Language Testing 6, 1.
- de Jong, H.A.L. 1990.
Standardization in Language Testing. AILA Review 7.
This is the complete text of the edited volume, and contains the following papers:
- Guest-editor's Preface
John H. A. L. DE JONG 3-5
- Language Testing in Research and Education: The Need for Standards
Peter J. M. GROOT 6-23
- The Cambridge-TOEFL Comparability Study : An example of the Cross-National Comparison of Language Tests
Fred DAVIDSON & Lyle BACHMAN 24-45
- The Australian Second Language Proficiency Ratings (ASLPR)
David E. INGRAM 46-61
- Cross-National Standards: A Dutch-Swedish Collaborative Effort in National Standardized Testing
John H.A.L. DE JONG & Mats OSCARSON 62-78
- The Hebrew Speaking Test: An Example of International Cooperation in Test Development and Validation
Elana SHOHAMY & Charles W. STANSFIELD 79-90
- EUROCERT: An International Standard for Certification of Language Proficiency
Alex OLDE KALTER & Paul VOSSEN 91-105
- Response to Alex Olde Kalter and Paul Vossen
John READ 106-107
- Dikli, A. 2006.
An Overview of Automated Scoring of Essays. Journal of Technology, Learning, and Assessment, 5, 1.
- Dooey, P. 1999.
An investigation into the predictive validity of the IELTS Test as an indicator of future academic success
.
In K. Martin, N. Stanley and N. Davison (Eds), Teaching in the Disciplines/ Learning in Context, 114-118.
Proceedings of the 8th Annual Teaching Learning Forum, The University of Western Australia, Perth.
- Dorans, N. J. 2008.
The practice of comparing scores on different tests. R&D Connections 6. Princeton, NJ: Educational Testing Service.
- Dunkel, P. A. 1997.
Computer-Adaptive Testing of Listening Comprehension: A Blueprint for CAT Development
The Language Teacher Online, 21, 10.
- Dunkel, P. A. 1999.
Considerations in developing or using
second /foreign language proficiency computer-adaptive tests
Language Learning & Technology 2, 2, 77-93
- Dunkin, M. J. 1997.
Assessing Teachers' Effectiveness. Issues in Educational Research, 7(1), 1997, 37-51.
- Dymoke, S. (no date).
Assessing Your Pupils' Poetry. Poetry Class Website Resources.
E
- Educational Testing Service.
ETS Fairness Review & ETS Standards for Quality and Fairness.
- Elder, C. (1998).
What counts as bias in language testing?
Melbourne Papers in Language Testing 7, 1.
- Embretson, S. 1983.
Construct Validity: Construct Representation Versus Nomothetic Span. Psychological Bulletin, 93, 1, 179 - 197.
- Emmerich, W., Enright, M. K., Rock, D. A. and Tucker, C. 1991.
The Development, Investigation, and Evaluation of New Item Types for the GRE Analytical Measure.
Educational Testing Service, Princeton NJ, ETS Research Report 91-16.
- Ennis, R. H. 1999.
Test Reliability: A Practical Exemplification of
Ordinary Language Philosophy. Philosophy of Education
F
- Feast, V. 2002.
The Impact of IELTS scores on performance at university.
International Education Journal, 3, 4, 70 - 85.
- Frary, R. B. 1996.
Hints for Designing Effective Questionnaires Practical Assessment, Research and Evaluation, Vol. 11
- Frary, R. B. 1995.
More Multiple Choice Item Writing Do's and Don'ts. ERIC/AE Digest Series EDO-TM-95-4.
- Frary, R. B. 2002.
A Brief Guide to Questionnaire Development.
- Fulcher, G. 1999.
Ethics in Language testing TAE SIG Newsletter - Special Conference Issue, Volume 1, No. 1
- Fulcher, G. 2000.
Computers in Language Testing In Brett, P. and G. Motteram (Eds) 2000
A Special Interest in Computers. Manchester:
IATEFL Publications, 93 - 107. Reprinted in electronic format with the kind permission of IATEFL.
- Fulcher, G. 2001.
Machines get clever at testing Education Guardian, 17 May.
- Fulcher, G. 2003.
Few ills cured by setting scores Education Guardian, 17 April.
- Fulcher, G. 2004.
Are Europe's tests being built on an 'unsafe' framework? Education Guardian, 18 March.
Read the response from Brian North
- Fulcher, G. 2008. "Testing Times Ahead?"
Liaison Magazine, Issue 1: July, 20 - 24.
Published by the UK Subject Centre for Languages, Linguistics and Area Studies, University of Southampton.

- Fulcher, G. 2009. Test use and political philosophy.
Annual Review of Applied Linguistics, 29, 3 - 20.

G
- Gebril, A. and Plakans, L. 2009.
Investigating source use, discourse features, and process in integrated writing tasks.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 7, 47 - 84.
- Geisinger, Kurt F. - Carlson, Janet F. 1995.
Testing Students with Disabilities
ERIC Digest.
- Gibson, E. J., Brewer, P. W. Dholakia, A., Vouk, M. A., and Bitzer, D. L. 1995.
A Comparative Analysis of Web-Based Testing and Evaluation Systems
North Carolina University.
- Gilfert, S. 1996. A Review of TOEIC The Internet TESOL Journal 11, 8.
- Ginther, A. 2001.
Effects of the presence and absence of visuals on performance on TOEFL CBT listening-comprehension stimuli
TOEFL Research Report 66, Princeton, N.J.: Educational Testing Service.

- Glass, G. V. 1978.
Standards and criteria Journal of Educational Measurement 15, 4, 237 - 261.
- Godwin-Jones, B. 2001.
Language Testing Tools and Technology Language Learning & Technology,
Vol. 5, No. 2, May 2001, 8-12
- Gorsuch, G. J. and Cox, T. 2000.
Something Old, Something New, Something Borrowed, Something....: Piloting a Computer Mediated Version of the Michigan Listening Comprehension Test
TESOL EJ 4, 4.
- Grant. S. G. 2000 Teachers and Tests:
Exploring Teachers' Perceptions of
Changes in the New York State Testing Program Education Policy Analysis Archives, 8, 14.
- Godwin-Jones, B. 2001.
Emerging Tools: Language Testing Tools and Technologies.
Language Learning and Technology, Vol 5, No. 2, May 2001, 8 - 12.
- Gorin, J. S. 2007.
Reconsidering Issues in Validity Theory. Educational Researcher 36, 8, 456 - 462.

- Grabowski, K. C. 2007.
Reconsidering the measurement of pragmatic knowledge using a reciprocal written task format. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 7, 1.

- Gruba, P. A. 1999.
The role of digital video media in second language listening comprehension. University of Melbourne: Unpublished PhD thesis.

H
- Haji pour Nezhad, G. R. 2002.
Item complexity and Judgment Revisited.
Unpublished PhD Thesis, Tehran University.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 1.
JALT Testing and Evaluation SIG Newsletter, 5, 3, 2 - 5.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 2.
JALT Testing and Evaluation SIG Newsletter, 6, 1, 2 - 5.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 3.
JALT Testing and Evaluation SIG Newsletter, 6, 2, 2 - 5.
- Hamilton, L. S., Klein, S. P., and Lorie, W. No Date.
Using Web-Based Testing for Large-Scale Assessment
Rand Education.
- Hansen, E. G., Forer, D. C., & Lee, M. J. 2004.
Toward accessible computer-based tests: Prototypes for visual and other disabilities.
.
TOEFL Research Report RR-78. Princeton, NJ: Educational Testing Service.
- Harding, L. 2008.
Accent and academic listening assessment: A study of test-taker perceptions.
Melbourne Papers in Language Testing 13, 1.
- Harlen, W. H. and Crick, R. D. 2002.
A Systematic Review of the impact of summative assessment and tests on students'
motivation for learning.
London: Institute of Education, Evidence for Policy and Practice Information
and Co-ordinating Centre.
- Hong, W-P, 2008.
Does high-stakes testing increase cultural capital among low-income and racial minority students?
.
Educational Policy Analysis Archives, 16, 6.
- Hguyen, T. N. H. 2007.
Effects of test preparation on test performance - the case of the IELTS and TOEFL iBT Listening Tests.
Melbourne Papers in Language Testing 12, 1.
- Huitt, B., Hummel, J. and Kaeck, D. 1995.
Assessment, Measurement, Evaluation and Research Valdosta State University
- Hutchison, D. and Benton, T. 2009.
Parallel Universes and Parrallel Measures: Estimating the Reliability of Test Results.
London: OFQUAL and the National Foundation for Educational Research.
I
J
- Jacobsen, M., Kremer, R., and Flores, R. 1999
WebCT in Computer Science New Currents in Teaching and Learning, 6, 3.
- Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., Taylor, C. 2000
TOEFL 2000 Framework: A Working Paper
Educational Testing Service, Princeton NJ.

- Jia, Y., and Zhang, W. 2007
Evaluating the construct validity of an EFL test for PhD candidates: A quantitative analysis of two versions
Shiken, 11, 1, 2 - 16.
- Joint Committee on Testing Practices. 2004.
Code of Fair Testing Practices in Education.
American Psychological Association.
K
- Kane, M. 2010.
Errors of Measurement, Theory, and Public Policy.
12th Annual William H. Angoff Memorial Lecture. Princeton, NJ: Educational Testing Service.
- Kang, O. 2008.
Ratings of L2 oral performance in English: Relative impact of rater characteristics and accoustic measures of accendtedness.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 6, 181 - 205.
- Karavas, E., and Delieza, X. 2009.
On-site observation of KPG oral examiners: Implications for oral examiner training and evaluation.
Journal of Applied Language Studies 3, 1, 51 - 77.
- Kehoe, J. 1995.
Basic Item Analysis for Multiple-Choice Tests.
ERIC Digest.
- Kehoe, J. 1995.
Writing Multiple Choice Test Items.
ERIC Digest.
- Kenworthy, R. 2006.
Timed versus At-home Assessment Tests: Does Time Affect the Quality of Second Language Learners' Written Compositions?
.
TESOL-EJ 10, 1.
- Kenyon, D. M. and Malabonga, V. 2001.
Comparing examinee attitudes toward computer-assisted and otheroral proficiency assessments.
Language Learning and Technology, Vol 5, No. 2, May 2001, 60 - 83.
- Kim, H. J. and Shin, H. W. 2006.
A reading and writing placement test: Design, evaluation, and analyais. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 6, 2.

- Kirsch, I., Jamieson, J., Taylor, C., and Eignor, D. 1998.
Computer Familiarity Among TOEFL Examinees
TOEFL Research Report 59, Educational Testing Service,
Princeton NJ.

- Kitao, S. K. and Kitao, K. 1996. Testing
Communicative Competence Internet TESOL Journal, 2, 5.
- Kitao, S. K. and Kitao, K. 1996.
Testing Grammar Internet TESOL Journal, 2, 6.
- Kitao, S. K. and Kitao, K. 1996.
Testing Listening Internet TESOL Journal, 2, 7.
- Knoch, U. 2008.
Collaborating with ESP Stakeholders in Rating Scale Validation: The case of the ICAO Rating Scale.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 7, 21 - 46.
- Knoch, U. 2009.
The assessment of academic style in EAP writing: The case of the rating scale.
Melbourne Papers in Language Testing 13, 1.
- Koretz, D., Russell, M., Shin, C. D., Horn, C. and Shasby, K. 2002.
Testing and diversity in postsecondary
education: The case of California Education Policy Analysis Archives, 10, 1.
- Kyllonen, P. C. 2005.
The case for noncognitive assessments. R&D Connections 3. Princeton, NJ: Educational Testing Service.
L
- Laborda, J. G. 2007.
From Fulcher to PLEVALEX: Issues in Interface design, validity and reliability in Internet based Language Testing CALL-EJ Online 9, 1.
- Lane, S. 1999.
Validity Evidence for Assessments Reidy Interactive Lecture Series
- Lazaraton, A. and Wagner, S. (1996).
The Revised TSE test: Discourse Analysis of Native Speaker and Nonnative Speaker Data Research Report 96-10. Princeton NJ: Educational Testing Service.
- Lee, Y.-W., Breland, H., & Muraki, E. 2004.
Comparability of TOEFL CBT writing prompts for different native language groups.
.
TOEFL Research Report RR-77. Princeton, NJ: Educational Testing Service.
- Lightsone, K and Smith, S. M. 2009.
Student Choice between Computer and Traditional Paper-and-Pencil University Tests: What Predicts Preference and Performance?
.
Revue internationale des technologies en pedagogie universitaire / International Journal of Technologies in Higher Education, vol. 6, 1, 2009, p. 30-45.
- Linn, R. L. 2003.
Performance Standards: Utilitily for Different Uses of Assessments.
Education Policy Analysis Archives
Volume 11 Number 31
- Linn, R. L., Baker, E. L. and Dunbar, S. B. 1991.
Complex, Performance-Based Assessment: Expectations and Validation Criteria. CSE Technical Report 331.
- Livingstone, S. A. 2009.
Constructed-response test questions: Why we use them; how we score them. R&D Connections 11. Princeton, NJ: Educational Testing Service.
- Livingston, S. A. and Zieky, M. J. 1982.
Passing Scores: A Manual for Setting Standards of Performance on Educational and Occuptational Tests.
.
Princeton, NJ: Educational Testing Service.
Warning: This is a slow download. Click and then leave it alone to download.
- Liu, O L. 2009.
Measuring learning outcomes in higher education. R&D Connections 10. Princeton, NJ: Educational Testing Service.
- Loevinger, J. 1957.
Objective tests as instruments of psychological theory. Psychological Reports 3, 635 - 694. Southern Universities Press, Monograph Supplement 9.
- Loulou, D. 1995.
Making the A: How To Study for Tests.
ERIC/AE Digest Series EDO-TM-95-10
M
- Malone, M. 2000.
Simulated Oral Proficiency Interviews: Recent Developments. ERIC Digest.
- May, L. 2006.
An examination of rater orientations on a paired candidate discussion task through stimulated verbal recall.
Melbourne Papers in Language Testing 11, 1.
- McAulay, A. 2002.
Peer and Self-evaluation in Spoken Tests: Tools and Methods Internet TESOL Journal, September.
- McLean, L., Myers, M., Smillie, C., and Vaillancourt, D. 1997.
Qualitative Research Methods: An essay review
Education Policy Analysis Archives, 5, 13.
- McClellan, C. 2010.
Constructed-Response Scoring - Doing it Right R&D Connections 13. Princeton, NJ: Educational Testing Service.
- Mehrens, A. A. No Date.
Preparing Students to Take Standardized Achievement Tests
ERIC Digest.
- Messerklinger, J. 1997.
Evaluating Oral Ability The Language Teacher Online, 21, 11.
- Mills, A., Swain, L. and Weschler, R. 1996.
The Implementation of a First Year English Placement System Internet TESOL Journal, 2, 11.
- Milton, J. 2006.
French as a Foreign Language and the Common European Framework of Reference for Languages.
Proceedngs from the Crossing Frontiers: Languages and International Dimension
conference, Cardiff University, 6 - 7 July.

- Monaghan, W. 2006.
The facts about subscores. R&D Connections 4. Princeton, NJ: Educational Testing Service.
- Monaghan, W. and Bridgeman, B. 2005.
E-rater as a quality control on human scores. R&D Connections 2. Princeton, NJ: Educational Testing Service.
- Moritoshi, P. 2001.
The Test of English for International Communication (TOEIC): necessity, proficiency levels,
test score utilization and accuracy. University of Birmingham, UK: MA assignment.
- Moritoshi, P. 2002.
Validation of the Test of English Conversation Proficiency.
University of Birmingham: MA dissertation.
- Moodie, I. 2008.
Using Pair Work Exams for Testing in the ESL/EFL Conversation Classes.
Internet TESL Journal XIV, 8.
- Mueller, J. 2003.
Authentic Assessment Toolbox. North Central College, Naperville, IL.
N
- Newfields, T. 2005.
TOEIC Washback Effects on Teachers: A Pilot Study at One University Faculty
Educational Policy Archives, 14, 1.
- Nichols, S. L. and Glass, G. V. 2006.
High-Stakes Tesing and Student Achievement: Does Accountability Pressure Increase Student Learning?
Toyo University Keizai Ronshu, 31, 1, 83 - 106
- North, B. 2004.
'Europe's framework promotes language discussion, not directives'. Education Guardian, 15 April.
A reply to Glenn Fulcher
- Norris, J. M. 2001.
Concerns with computerized adaptive oral proficiency assessment.
Language Learning and Technology, Vol 5, No. 2, May 2001, 99 - 105.
O
P
- Papajohn, D. 2006.
Standard setting for next generation TOEFL Academic Speaking Test (TAST): Reflections on the ETS Panel of International Teaching Assistant Developers
.
TESOL-EJ 10, 1.
- Park, T. 2004.
An investigation of an ESL placement test of writing using Many-facet Rasch MeasurementA>
Teachers College, Columbia University Papers in TESOL and Applied Linguistics 4, 1.
- Phakiti, A. 2006.
Modeling cognitive and metacognitive strategies and their relationship to EFL reading test performance.
Melbourne Papers in Language Testing 11, 1.
- Poole, G. 2003.
Assessing Japan's Institutional Entrance Requirements.
Asian EFL Journal 5, 1.
- Praphal, K. 1990.
The relevance of language testing research in the planning of language programmes.
Thailand: Chulalongkorn University.
Q
R
- Ranali, J. M. 2002.
Comparing scoring procedures on a cloze test.
University of Birmingham, UK: MA assignment.

- Robb, T. N. & Ercanbrack, J. 1999.
A Study of the Effect of Direct Test Preparation on
the TOEIC Scores of Japanese University Students
TESOL-EJ, 3, 4.
- Roever, C. 2001.
Web based language testing.
Language Learning and Technology, Vol 5, No. 2, May 2001, 84 - 94.
- Roever, C. and Powers, D. E.. 2005.
Effects of language administration on a self-assessment of language skills.
TOEFL Monograph No. MS-27. Princeton, NJ: Educational Testing Service.
- Rosenfeld, M., Leung, S., & Oltman, P. K. . 2001.
The reading, writing, speaking, and listening tasks important for academic success at the undergraduate and graduate levels.
TOEFL Monograph No. MS-21. Princeton, NJ: Educational Testing Service.
- Rosenshine, B. 2003.
High Stakes Testing: Another analysis.
Education Policy Analysis Archives
Volume 11 Number 24
- Ross, J. A. 2006.
The Reliability, Validity, and Utility of Self-Assessment.
Practical Assessment, Research and Evaluation
Volume 11 Number 10
- Rudner, L. 1994.
Questions to ask when evaluating tests.
ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 1998.
An Online, Interactive, Computer Adaptive Test Tutorial.
ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 2001.
Reliability. ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 2006.
An evaluation of IntelliMetric Essay Scoring System. Journal of Technology, Learning, and Assessment 4, 4.
- Russell, M.1999. Testing On Computers:
A Follow-up Study Comparing Performance On
Computer and On Paper Education Policy Analysis Archives, 7, 20.
- Russell, M. and Haney, W. 1997.
Testing Writing on Computers: An Experiment Comparing Student Performance on Tests Conducted
via Computer and via Paper-and-Pencil Education Policy Analysis Archives, 5, 3.
- Russell, M. and Haney, W. 2000.
Bridging the Gap between Testing and Technology in Schools.
Education Policy Analysis Archives, 8, 19.
S
- Sanders, W. and Horn, S. P. 1995. Educational Assessment
Reassessed: The Usefulness of Standardized and Alternative Measures of Student
Achievement as Indicators for the Assessment of Educational Outcomes Education Policy Archives, 3, 6.
- Sarle, Warren S. 1995. Measurement theory:
Frequently asked questions From the Disseminations of the International Statistical Applications Institute, 4th edition, Wichita: ACG Press, 61-66.
Also available at: ftp://ftp.sas.com/pub/neural/measurement.html
- Sawaki, Y. 2001.
Comparability of Conventional and Computerized Tests of Reading in a Second Language. Language Learning and Technology
Vol. 5, No. 2, May 2001, pp. 38-59 .
- Sawaki, Y. and Nissan, S. 2009.
Criterion-related validity of the TOEFLiBT Listening Section. TOEFL Research Report 09-02. Princeton, NJ: Educational Testing Service.
- Scharber, C., Dexter, A. and Riedel, E. 2008.
Students' Experiences with an Automated Essay Scorer. The Journal of Technology, Learning, and Assessment.
- Sireci, S. G. 2007.
On Validity Theory. Educational Researcher 36, 8, 477 - 481.
- Sokolik, M. and Duber, J. 2002.
Grow Your Own: Online Placement Testing TESL-EJ, 6, 1.
- Stansfield, C. W. 1992. ACTFL Speaking Proficiency Guidelines Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Stansfield, C. W. 1996. Content Assessment in the Native Language Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Stansfield, C. W. & Kenyon, D. 1996. Simulated Oral Proficiency Interviews: An Update Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- State of Illinois. 1995.
Assessment Handbook. A Guide for Developing Assessment Programs in Illinois Schools
Springfield, IL: Illinois State Board of Education.
- Stricker, L. J. 2002.
The Performance of Native Speakers of English and ESL Speakers on the Computer-Based TOEFL
and the GRE General Test.
Princeton NJ: Educational Testing Service, TOEFL Research Report 69.

- Swain, M., Huang, L-S, Barkaoui, K., Brooks, L., and Lapkin, S. 2009.
The Speaking Section of the TOEFL iBT: Test-takers' Reported Strategic Behaviors.
Princeton NJ: Educational Testing Service, TOEFLiBT Research Report 09-30.

T
- Tannenbaum, J. 1996. Practical Ideas On Alternative Assessment For ESL Students Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Tannenbaum, R. J. and Wylie, E. C. 2008. Linking English language test scores onto the Common European Framework of Reference: An application of standard setting methodology. TOEFL iBT Report iBT-06. Princeton, N.J: Educational Testing Service.

- Tasdemir, M., Tasdemir, A., and Yildirim, K. (2009)
Influence of Portfolio Evaluation in Cooperative Learning on Student Success.
Journal of Theory and Practice in Education, 5, 1, 53 - 66.

- Taylor, C. S. and Nolan, S. B. 1996.
What does the psychometrician's classroom look like? Reframing assessment concepts in the context of learning.
Educational Policy Archives, 14, 7.
- Taylor, C., Jamieson, J., Eignor, D., & Kirsch, I. 1998.
The relationship between computer familiarity and performance on computer-based TOEFL test tasks.
.
TOEFL Research Report RR-61. Princeton, NJ: Educational Testing Service.
- Tsang, S. L., Katz, A. and Stack, J. 2008.
Achieving Testing for English Language Learners, Ready or Not?.
Educational Policy Archives, 16, 1.
- Tuzi, F. 1997. Using Microsoft Word to Generate Computerized Tests Internet TESOL Journal, 3, 11.
U
V
W
- Wagner, A. No Date.
Don't Messick around with Test Validity until you know what you're doing.
- Wagner, E. 2002.
Video listening tests: A pilot study.
Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 2, 1.
- Wagner, E. 2007.
Are They Watching? Test-Taker Viewing Behavior During an L2 Video Listening Test.
Language Learning and Technology, 11, 1.
- Walker, M. E. 2007.
Is test score reliability necessary? R&D Connections 5. Princeton, NJ: Educational Testing Service.
- Wall, D., & Horak, T. 2006.
The impact of changes in the TOEFL examination on teaching and learning in central and eastern Europe. Phase I: The baseline study .
TOEFL Monograph No. MS-34. Princeton, NJ: Educational Testing Service.
- Wall, D., & Horak, T. 2008.
The impact of changes in the TOEFL examination on teaching and learning in central and eastern Europe. Phase 2: Coping with change .
TOEFL iBT Report No. iBT-05. Princeton, NJ: Educational Testing Service.
- Wang, J., and Brown, M. S. 2007.
Automated Essay Scoring Versus Human Scoring: A Comparative Study. Journal of Technology, Learning, and Assessment, 6, 2.
- Wendler, C. and Powers, D. 2009.
What does it mean to repurpose a test? R&D Connections 9. Princeton, NJ: Educational Testing Service.
- Wilson, N. 1998.
Educational Standards and the Problem of Error.
Educational Policy Archives, 6, 10.
- Wolfe, E. W. and Manalo, J. R. 2004.
Composition Medium Comparability in a Direct Assessment of Non-native English Speakers.
Language Learning and Technology, 8, 1, 52 - 65.
- Wright, P. W. D. and Wright, P. D. 2004.
Understanding Tests and Measurements for the Parent and Advocate.
LDOnline.
- Wylie, E.
An overview of the International Second Language Proficiency Ratings (ISLPR).
Australia: Griffith University Centre for Applied Linguistics and Languages.
X
Y
- Yen, D. A. and Kuzma, J. No date.
Higher IELTS score, higher academic performance? The validity of IELTS in predicting the academic performance of Chinese students. Mimeo: University of Worcester.
- Yoff, L. 1997. 'An overview of ACTFL proficiency interviews. A test of speaking ability.' JALT Testing and Evaluation SIG Newsletter,
1, 2, 3 - 9.
- Young, J. W. 2008.
Ensuring valid test content tests for English language learners. R&D Connections 8. Princeton, NJ: Educational Testing Service.
- Young, J. W. and King T. C. 2008. 'Testing Accommodations for English Language Learners: A Review of State and Disctrict Policies. New York: College Board.
Z
|
|
| |
|
|
|