I start this chapter with a very simple story about tests told to me by Wen-Hsing, a former student of mine who now teaches at a secondary school back home in Taiwan. Wen- Hsing recently sent me an e-mail about what happened when she and her colleagues were grading the English language component of an entrance exam: One section of the English test was: “According to the picture, answer the following five questions.” It was a picture of a classroom, where there is a teacher standing and six students seated. It looked like two of the students were talking to each other and the teacher was not happy about it. One of the five questions was “Why is the teacher angry? The teacher is angry because students are__.” This blank only allowed one word and the “standard” answer, according to the test-giver, was “talking.” When we were grading the answer sheet, we found there were a variety of answers and some of them that seemed possible were “playing,” “noisy,” “bad.” Thus, we voted to decide if we would accept these answers. Interestingly, “noisy” and “bad” were accepted but “playing” was rejected. The reason of the majority was that we could not tell from the picture whether these two students were playing or not. Well then, I asked them, “Can you tell from the picture that these two are bad?” The answer I got was “We all agreed not to include ‘playing’ in the answers. If we reached an agreement, it is fine.” Although I would like to give students whose answer was “playing” credit, I couldn’t do it and I graded those answer sheets the way I was told to do. This case was not unique. It happened every time I graded in the entrance exam. I don’t know why some possible answers were accepted but some were not. I think it is good to have students answer questions according to the picture they see, but is it necessary to restrict the number of answers? You know what, I always felt “not so good” after grading because there was always one or two answers that would arise dispute. The problems faced by Wen-Hsing and her colleagues reveal the profoundly moral nature of assessment in language teaching. These Taiwanese teachers are striving to adjudicate which knowledge is sanctioned and which is not; their deliberations involve drawing lines in the sand where there are few if any objective criteria unambiguously separating right from wrong. Yet the consequences of their decisions will be visited on the children of their classes and, over time, will become part of each child’s permanent record. Of course, one could argue that the item in question is simply badly designed and that what is needed is just a better test composed of less ambiguous questions. Yet I believe that anyone who has tried to write a test, whether a professional test designer or a classroom teacher, will recognize the difficulties the Taiwanese teachers face. With such a phenomenally complex thing as a language, there are limitless problems that arise in determining ways of testing students’ knowledge; the more complicated and interesting that knowledge becomes, the harder it is to test (Bachman, 2000). Furthermore, those who are most adept at writing test items—professional testers—are also those farthest removed from the classroom, and thus they lack information about what has been covered in class by particular groups of students. All of us are obliged to make do with faulty tools in the work of evaluating students. In this chapter I explore the moral dynamics underlying various aspects of testing and evaluation. During the discussion, I raise many complex moral questions both about traditional forms of evaluation such as standardized tests and examinations, and about alternative approaches to assessment such as portfolios. I argue, however, that two profound moral paradoxes underlie the entire realm of language testing and assessment.

0 comments