When taking tests it is very rare for the candidates to have access to materials of any kind. In language tests the only resource that is sometimes made available is a dictionary, and this is the case only when spelling is not going to be tested. Studies have looked at the impact of dictionary use in writing (East, 2008), and reading tests (Bensoussan & Sim , 1984: Nesi & Meara, 1991) in particular. But recently there has been a growing interest in Open Book and 'Open Web' Exams, in which the test takers can have any materials they wish to hand, and access the internet while the test is in progress. The principal argument in support of open book exams is their authenticity, particularly in a university learning environment (Williams, 2004).
In language testing and assessment we have been aware of the problems of defining 'authenticity' for a long time (Spolsky, 1985; Stevenson, 1985; Lewcowicz, 2000). But in the context of open book/open web exams, the argument goes that in the 'real world' we do not write in isolation. We have access to a wealth of information not only from books, but from the multimedia offered by the internet. When we are asked to write for a purpose we read and listen, process information, decide how to formulate an argument, and select illustrative material to support the views we put forward. And we don't have to remember endles facts. Supporters of open book/open web exams therefore argue that this type of examination is a 'more realistic' learning opportunity, and avoids the necessity for students to remember too much information (Rakes, 2008). In traditional language tests this is acknowledged to some degree in new integrated item types, such as those introduced into the TOEFL iBT (Zareva, 2005). In order to understand what 'integrated testing' involves, see the video on this topic by Lia Plakans. Providing input material prior to a writing test ensures that the test takers have ideas to write about, and often means that the test designers can use prompts that are not so bland that no one who takes the test may have a significant disadvantage because they know nothing about the topic.
What you may be surprised to discover is that these very same arguments have been around for well over a hundred years. Latham (1887, 206) writes as follows:
...I would also allow candidates, while writing their essay in the Examination Room, to have access to some standard authorities on their subject. What these should be would rest with the Examiners. This proposal needs some recommendation, because to many it will seem novel. It is a mode of carrying out the principle above stated, of making the candidate write in an Examination under circumstances as little exceptional as possible.
The explicit reason for this proposal is that is what they would do if asked to write in the 'real world'. No one simply writes everything from memory; and being able to refer to sources means that the writer can concentrate on the writing (ibid., 207):
When they noted a valuable fact or observation, they would turn it over to their "Index Memories", and be able to lay their hands on the passage which contained it if they should want it. No man writing a book would be justified in quoting from memory, however confident he may feel of remembering rightly. Authors no doubt did so in old times, when books were harder to come by, and vast trouble has been caused to their editors in consequence. There is now no object in forcing men to carry a number of details in their heads....By this plan, moreover, the Examiner obtains a further advantage....the range of subjects which can be given for essays is very much extended; for there are many points about which candidates could not write, without the help I propose to offer, even if the subjects were taken from their favourite branch of study....
Watch this video from the BBC. The story reports on the introduction of open web language tests in Denmark in 2009. Denmark is the first country to introduce this alternative approach to its national school tests. As you watch the video, make notes on precisely what the test takers (a) have access to, and (b) which internet resources have been banned. Why do you think these decisions have been taken? What arguments are put forward to support the approach? Do you agree?
Now let's turn to some of the arguments against. Clearly, as we saw in the video, there is the constant worry about cheating, especially when access to the internet is provided. But this is part of a larger concern with the nature of test administration, and the testing 'experience' for the test takers. This is a definition of a 'standardized test':
Tests are standardized when the directions, conditions of administration, and scoring are clearly defined and fixed for all examinees, administrations and forms (Cohen & Wollack, 2006).
There is a reason for this practice. It is to ensure that each test taker has an equal chance of doing well, and that the score they get on a writing test reflects their writing ability. If we have open book, open web tests, it is arguably the case that each test taker has a different experience. Those who are much more efficient users of the resources to hand may get higher writing scores than those who are not so efficient. Perhaps test takers who are able to read and process information more quickly would get a higher writing score? Or it may be that test takers who are better able to create search terms do better, because they access higher quality information? On the other hand, test takers with weaker internet skills may be tempted to copy from an internet source, and thus score lower because of plagiarism.
There is much room for research here. But the key question remains: are the associated skills required to succeed in open book, open web exams, part of what we wish to test in a writing examination, or not? That is, are these 'construct relevant' or 'construct irrelevant' variables, for the purpose of any given test? Anything that we don't want to test shouldn't really be introduced as a variable in the examination. So if we think that 'reading' should not be assessed in a writing test, but we ask the test takers to base their writing on something they have read, we say that their 'reading ability' - which is not of interest - might 'contaminate' the scores. On the other hand, we may wish to claim that we are testing integrated 'reading - writing', in which case the reading construct is relevant to our score interpretation.
These are matters for careful thought, in the context of the purpose of the test and the kinds of decisions for which we wish to use the test scores. And when we have made up our mind what we want to test, we have to be able to score the test, and say what the score means. So, let's say we have five possible scores for the writing test described in the video: A, B, C, D and E. How would you report what a 'B' means for everyone who you think may be in this second level? This has to encapsulate what we think 'B' students 'can do' in complex tasks that lead to the production of a piece of text. Arriving at such 'descriptors' is a very real challenge for all tests, but much more so for any test that involves complex integrated tasks.
Whichever side of the argument you come down on, notice that the argument takes place within the context of high stakes tests, where the scores are going to lead to certification our some kind of momentous decision for the test taker. For classroom teaching and assessment there is not the same kind of worry. In fact, it seems highly desirable to teach learners to research and write in this way, as that is arguably what they will have to do in the future.
A Group Discussion Task
Think of a writing test with which you are familiar.
Would you turn it into an 'open book/open web' test?
With your colleagues, articulate an argument for whichever position you take. What are the key reasons for your decision? Are there any threats to score interpretation as a result of your decision?
A Group Activity
Using either the open book/open web test you have considered in the Group Discussion Task, or the Danish writing test, try to create a set of descriptors that would say what a learner could do at each of five levels. Create a table like this one, and complete it. When you have finished, what are the strengths and weaknesses in your descriptors?
Grade A
Descriptor:
Grade B
Descriptor:
Grade C
Descriptor:
Grade D
Descriptor:
Grade E
Descriptor:
If you're not sure what descriptors look like, have a look at the ACTFL descriptors for writing (preliminary proficiency guidelines, 2001) as an example.
Cohen, A. S. and Wollack, J. A. (2006). Test administration, security, scoring and reporting. In Brennan, R. L. (Ed.) Educational Measurement. Fourth edition. New York: American Council on Education/Praeger Publishers, 355 - 386.