Context in Language Testing

An Audio Lecture in Five Parts based on Fulcher (2015) - see Further Reading below

Context impacts upon our behavour. This is a truism. If I take a test in a hot room because the air conditioning has broken down, my score is likely to be lower than could be expected if the temperature were more condusive to concentration. Look at the video on the right. This is a link to a BBC programme on light in schools. There are four different light settings, and it is claimed that the behaviour of the learners changes depending on which setting is in use. Even if, like me, you think there might be some Hawthorne effect in play here, the fact of the matter is that there is evidence that tweaks to any aspect of the context in which we learn or take tests might change our performance. But does this mean that we are unable to get a "good measure" of a "stable trait" that exists within the individual?

The role of context in language testing has been a hot topic ever since Tim McNamara published Modelling Performance: Opening Pandora's Box in 1995. In recent years context in applied linguistics has become a hot topic with the publication of a plethora of papers and books that use Complex Dynamic Systems Theory as the theoretical framework for research or theory building. In natural sciences this approach has been used to study systems for which reductionist methods have proved wanting. However, in social science disciplines the tendency has been to focus on the complexity of context and our inability to understand or take into account all the variables that might impact on behaviour or performance in any given context. The conclusion is often that we are unable to make predictions from individual studies to other contexts, thus severely limiting the generalisability of findings.

In the following audio lecture I look at the issue of context in language testing. In the introduction I consider why all testing and educational assessment has traditionally treated context as problematic, focusing as it does on the individual test taker from a cognitive perspective. In the next three sections I look at different approaches to context. The first is atomism the second is neobehaviourism, and the third is interactionism. For each approach I use two criteria to evaluate the role of context: generalisability (the extent to which score meaning can be generalised beyond the context of a particular performance), and provlepsis (the extent to which a score is predictive of future performance). I then conclude by describing what I call the Keverberg Principle, and considering how context might be tamed through simplification.

1. Introduction

Supporting Material

In the introduction I mention the work of J. B. Carroll. You can download the complete text of his work Fundamental Considerations in Testing here.

2. Atomism

Supporting Material

I make reference to the work of Lado for the first time in this section, and return to him again in the conclusion. You may wish to dip into an extract from his book Language Testing here.

Multiple choice was the preferred item type for testers who wished to test elements of the language. For an explanation of why the multiple choice item has remained so popular for over a hundred years, click here

3. Neobehaviourism

Supporting Material

Atomism was attacked in the early Communicative Language Teaching Era. For my evaluation of that attack and the value of Lado's work click here.

The argument over variable competence models and how they relate to the modern monist view of context that obliterates individual competence has been going on for a long time. Click here to read my view on the problems caused for language testing from 1995.

4. Interactionism

Supporting Material

There have been many attempts to isolate factors that impact upon construct definition in an interactive model of language performance. One of these is the research reported in this paper which I published with Rosina Marquez-Reiter.

The introduction of more random variance to control for complexity was the brilliant insight of C. S. Peirce. His paper with one of his students laid the basis for all modern approaches to scientific investigation, and drug trials in particular. Click here to download the original paper.

I compare scores on language performance tests to those awrded when rating wine. Read about wine rating here.

5. Conclusion

Supporting Material

Arguments over authenticity and what should be in a test to best represent the target domain are common in many other areas of life, not just in language testing. In the driving test, for example, there is a constant debate about the nature of the tasks that a learner driver is asked to do and whether these are too abstracted from what they will experience on the road. But as you will learn from playing this news item in which experts disagree, the number of "real world" contexts that are listed are too many to include in any practical test. Listen to the debate about the future of the three point turn.

The final example I give is that of the Salford Energy House, which simplifies the complexity surrounding heat loss by creating an artificial environment in which variables can be controlled or accounted for. By careful manipulation of relevant variables much more is achieved than by undertaking naturalistic research. Is this a suitable model for language testing?

Further Reading

Bachman, L. F. (2006). Generalizability: A journey into the nature of empirical research in applied linguistics. In Chalhoub-Deville, M. (Ed.) Inference and Generalizability in Applied Linguistics: Multiple Perspectives (pp. 165 - 207). Amsterdam: John Benjamins.

Fulcher, G. (2015). Context and Inference in Language Testing. In King, J. (Ed.) The Dynamic Interplay Between context and the Language Learner (pp. 225 - 241). Basingstoke: Palgrave Macmillan.

Glenn Fulcher
May 2016