Or is it the fault of the students who demand a certain kind of teaching? […] Is it perhaps the fault of the teaching institutions, which do not provide any kind of teacher training in TOEFL preparation? […] Or is training in teaching test preparation the responsibility of the textbook writers and publishers? (p. 335) Hamp-Lyons’ (1998) questions remind me of the social nature of morality. The point of her questions, as I understand them, is not to apportion ultimate blame (“yes, it’s the students’ fault”) but to point up the fact that adjudicating on moral issues is a highly complex process in which many individuals and institutions have a stake. Standardized tests are social phenomena par excellence; any consideration of their moral significance must begin from this starting point. 2Of course, I am sidestepping the fact that with a so-called communicative test it is still necessary to define what knowledge of language is and to ignore the fact that an examination virtually by definition cannot involve genuine communication and therefore will always be only an indirect and artificial indication of the candidate’s “true” ability, however ability is defined. The Morality of Testing and Assessment 72 For example, much as I loathe the whole business of tests, when it comes to my own students I can understand why they would want to have extra preparation. Although at one level such an approach demolishes both the illusion of the snapshot-of-ability principle and the principle of equality and fairness, on the other hand, the moral importance of relation creeps in. It is never the case, and I would argue that it never should be, that a teacher’s own students are not more important than some other students in another state or country. We want the best for our own students, even if in general moral terms it could give them an “unfair” advantage. This is a classic instance of the way in which individual circumstances and specific relations color our approach to moral dilemmas. This brings me to a final point, which Noddings (1984) raised in her discussion of assessment. Contrary to received wisdom regarding the preferability of local, teacherdeveloped forms of assessment over mass standardized testing, Noddings wrote that she
is “convinced…that grading—summative evaluation of any kind—should not be done by teachers. If it must be done, it should be done by external examiners, persons hired to look at students as objects. Then teacher and students would be recognized as together in the battle against ignorance” (p. 195). Despite my instinctive and growing distaste for standardized tests of all kinds, I find Noddings’ argument curiously persuasive because, like her, I believe that our prime duty as teachers is to focus on the learning of our own students. Turning the problem of evaluation over to outsiders moves it from the immediate, local teacher-student relation, nd spares that relation the “grinding” experience mentioned earlier in which the teacher switches caps from advocate to judge. Noddings’ (1984) suggestion does not justify indiscriminate use of testing and it does not offer any justification of current testing practices or excuse test makers from an obligation to continually rethink the format and nature of their tests. It does, however, remind us that we are dealing with issues of immense moral complexity, in which unequivocal good and bad, right and wrong, are terribly hard to pin down.
Continue Reading

One candidate from a western Eu-ropean country had an English mother, and her entire application confirmed her own categorization of herself as “virtually bilingual,” yet her TOEFL score was a mere 597, which, the computer-generated form from the university’s Office of International Admissions told us, indicated that she “may need supplementary English training.” A doctoral candidate from an African country in which English is widely spoken had attended an Englishlanguage university for his undergraduate and master’s degrees and had sterling references and published academic work, yet his scores on the TOEFL and Graduate Record Examination were both abysmally low. What should we do in such situations? Both candidates were admitted, but in each case the test scores complicated the decision rather than making it easier; in the case of scores on the Graduate Record Examination, for example, we are required by the university’s graduate school to obtain an official exception for any candidate who does not score the required minimum in this test. 71 Values in English Language Teaching Another moral paradox is the disjuncture between testing and current pedagogical practice in language teaching (Hamp-Lyons, 1998). The communicative model that, in various forms, is widely used across the world encourages students to engage in meaningful interaction using whatever linguistic means they have at their disposal; it specifically downplays the importance of grammatical accuracy over communicative effectiveness. Although some tests have made more or less successful attempts to integrate communicative competence in their evaluations of students (the Cambridge suite of examinations and the ACTFL’s [American Council for the Teaching of Foreign Languages] proficiency guidelines for foreign languages [Byrnes & Canale, 1987] come to mind; see also Powers, Schedl, & Wilson, 1999), the TOEFL has notably lagged behind in this regard, and many other tests still focus narrowly on grammar and vocabulary.2 One consequence of this disjuncture is the so-called washback effect: the ways in which test format affects teaching. The theory behind the TOEFL, like any test of its kind, is that it is a snapshot of a candidate’s language ability at a given moment in time; thus, it should not be possible to improve one’s performance other than by more study of the language. In reality, of course, TOEFL preparation courses and programs abound. Liz Hamp-Lyons (1998) offered a very thoughtful analysis of some of the ethical (what I would call moral) issues that arise from the “powerful” (p. 331) washback effect of such an influential test as the TOEFL. Drawing on the work of Mehrens and Kaminsky (1989) and Popham (1991), Hamp-Lyons posed the question of what constitutes “ethical test preparation” (p. 334) and argued that existing materials (and hence a great deal of existing test preparation programs across the world) are “educationally indefensible (boosting scores without mastery) and of dubious ethicality (coaching merely for score gain)” (p. 334). She went on to ask a series of provocative and important questions, all of which have a strong moral dimension: Can a test be blamed for the ways in which some teachers teach towards it? […]
Continue Reading

This, combined with the strict and ritualistic way in which standardized tests are conducted, gives the knowledge they enshrine a solemn, almost sacred significance. The political, “interested” nature of knowledge featured in chapter 3 is a powerful component here too. I argue that there is deep moral meaning in such an approach to knowledge: By reducing learners to recipients of knowledge rather than creators of it, one is also reducing their capacity for moral agency. There is also a question of honesty here. Because of the veil of objectivity behind which they hide, standardized tests ride roughshod over the unavoidable difficulties of matching score with actual ability. The final score is presented (and in the overwhelming majority of cases is also treated) as an objective measure: The uncertainties and ambiguities that attend test development, and the myriad psychological factors that affect a candidate’s performance on a given day, are invisible. Furthermore, because of the physical and administrative distance between the testers and those tested, appeals are difficult, if not impossible. A teacher might possibly be inclined to be lenient on a student whose grandfather died a few days before the exam, or to give a student who has difficulty writing an extra minute or two at the end of a test. A standardized test can offer neither of these possibilities or anything like them. What is missing here is relation: The human relation between tester and testee, which exists when teachers prepare tests, and which informed the whole of the previous section, is entirely absent in the standardized test. By this account, the moral contours of the test are quite different. The educational process is suddenly deprived of its deepest and most meaningful component. This feature is underlined even more in the current shift to computerized testing in the TOEFL and many other common tests. The impersonal nature of such tests, and the impossibility of our understanding the human dimension of the test-taking experience of any specific individual, makes it very difficult for the consumers of test score information to know how to interpret them. As language professionals, we know the complexities I have been discussing here; as a result, reading the scores is very much a matter of interpretation rather than a simple acknowledgment of a score. Just 2 days ago I was reviewing some late admissions for our own master’s and doctoral programs.
Continue Reading

Everything I wrote earlier about the value-laden nature of assessment practices—that they are oriented to product rather than process, that they favor certain candidates over others, that they are used for administrative convenience rather than serving the needs of the learners—applies in spades in the case of standardized tests such as the TOEFL. Yet the TOEFL and its ilk also raise an additional set of moral concerns and dilemmas. Elana Shohamy (1998), one of the first people to raise questions about the “ethical” dimensions of language testing, described the widespread use of standardized tests to promote bureaucratic and political agendas. She identified three ethical consequences of such uses of tests:

1. The “institutionalized knowledge” (p. 339) that tests canonize is “narrow, simplistic and often different from experts’ knowledge” (p. 339). The kind of knowledge tested, which often involves single-word answers in multiple-choice formats, “overlooks the complexities of subject matter and is unmeaningful for repair” (p. 339).

2. A “parallel system” (p. 340) is created whereby stated policy is at odds with the “organizational aspirations” reflected in the tests. Shohamy gave the example of Israel, where “both Hebrew and Arabic are official languages, yet, on the high school entrance exam Arabs are tested in Hebrew, while Hebrew speakers are not tested in Arabic” (p. 340).

3. Ethical problems arise when “the test becomes a means through which the policy makers communicate priorities to the system” (p. 340). Shohamy sees this as “undemocratic and unethical” (p. 340) because those affected by the test—the students who take it and the teachers who teach them—have no say in the design and implementation of the test. This last point deserves further consideration. I would argue that the most serious moral concerns with such tests arise from their imper-sonal nature. As Shohamy (1998) pointed The Morality of Testing and Assessment 70 out, the people affected by the test have no say in its creation; through such procedures it is much easier to maintain the myth of the objective test, because the people who create the questions and assess performance are nowhere around—unlike with a teacher or school department, to which students usually have some kind of access.
Continue Reading


Who Is a Good Student?

Throughout this discussion I have deliberately been using the words good and bad. This whole discussion ultimately, revolves around a fundamental ambiguity inherent in the phrase good student (Amirault, 1995). On the one hand, a good student is one who does well: learns, passes tests and exams, and so on. These qualities and achievements are moral in nature the way that education in general is moral in nature. It is good to learn, to know more, to have more skills and abilities. Yet even here there is ambiguity. What exactly does it mean to do well, to succeed? Such questions once again go to the heart of our purpose in teaching. In an adult literacy class, for example, is a student successful if he reads a newspaper article? Or passes his GED (the high school equivalency examination)? Or if he gets a job? We might also ask: What of the student who learns well but does not pass the exam? Or what of the EFL student who gets only a C in English yet is promptly hired to teach English in an elementary school? (I have known such teachers myself.) Furthermore, there is a social notion of the “good” student that is also moral in nature, yet in a different way. This notion of the “good student” takes good to mean obedient, pleasant, willing, hard working, conscientious, persistent—all of which, of course, are 69 Values in English Language Teaching also morally desirable characteristics, and which, other things being equal, equip students better to benefit from their education. Yet this meaning of “good student” cannot always be reconciled with that mentioned in the preceding paragraph: Some students work hard and are pleasant but do not properly grasp the subject matter; others are sullen and lazy yet smart. What do we—what do you—mean when you use the expression “She’s a good student”? Which of these meanings is more important to you, and to the student concerned? Which meanings are reflected in the system of values underlying the forms of assessment you use?
It is important to emphasize the symbiotic relationship between the moral messages
sent by our assessment practices and our notions of what it is to be a good student. It is
through whatever assessment practices we use that the identity of good or bad student is
encoded in schools; conversely, our idea of the good student affects the kinds of
assessment we select. In either case, multiple powerful and complex moral meanings are
to be found in the kinds of tests and other forms of evaluation that we use in our
classrooms.
Continue Reading


Assessment Beyond Language

The value-laden nature of assessment, moreover, goes far beyond the simple matter of how to measure language ability. There are also crucial educational considerations to take into account. A central moral dilemma for many teachers, for example, at least in this country, is the extent to which they should reward effort, or ability, or achievement. Up until this point, I have been assuming that evaluation is intended to measure the student’s ability in English. But in much ESL teaching in the United States and certain other countries, great emphasis is placed on a student’s engagement in, or commitment to, her work. It is thought important to reward effort—the time and energy devoted to an assignment, rather than merely the quality of the finished product, or the willingness to participate in classroom discussion rather than the grammatical correctness of the contributions or the value of their substance. In this there is very clearly an issue of moral judgment: In rewarding “good” behavior, we are standing in judgment over the learner; we are adjudicating “good” and “bad” ways to be as well as knowledge of the subject matter. There is a strong component of moral education in the old-fashioned sense, of instilling 67 Values in English Language Teaching and reinforcing desirable behaviors, habits, and attitudes in our students (Jackson, Boostrom & Hansen, 1993). At the same time, another aspect of the moral dimension of power emerges, as we punish those who do not behave in approved ways, for example, giving lower grades to students who do not willingly take part in classroom activities, fail to turn in journals or other written work on time, and so on. What function does this punishment serve? As a warning for the future? As a sign to others? In any case, surely its consequences are not restricted to the moment in which a bad grade is given and received. Let me share an example of the complex issues at play here. For her doctoral dissertation, Ewald (2001) interviewed university-level students and teachers of Spanish about their attitudes toward group work. One teacher she spoke to, Gonzalo, explained that he graded students on their contributions to small-group work. The students to whom Ewald spoke, however, felt that this was an unfair practice, pointing out that although they accept the usefulness of small-group work, for some students participation in such groups is rendered difficult for nonlinguistic reasons such as shyness. Ewald (2001, p. 166) reported that Gonzalo’s practice is grounded in a belief that evaluation of this aspect of their work in class will motivate students to participate more and help them to see the value of small-group work (and we know that in language classes, the more you speak, the more you learn). He might also have wished to be able to reward the students who contribute more willingly. Yet, as the students’ reaction shows, this practice brings with it several moral dilemmas. First, there is the question of the extent to which personality traits such as shyness should affect one’s grade. Second, Ewald pointed out that the students were already aware of the expectation of participation and did not need to be reminded of it. This becomes a matter of trust (p. 167): That is, the practice of evaluating contributions to group work carries with it the implication that without the pressure of the evaluation students cannot be trusted to participate of their own accord. I would also point out a third issue of measurement: the problem of how to assign scores fairly to something as complex as participation in a small group. The practice of rewarding hard work as well as “objectively” measured ability gives rise to its own moral dilemmas. What do we do with those students who work terribly hard and yet simply do not have the wherewithal to do A-grade work? Conversely, what do we do with the bright but disaffected students who are able to speak fluently and write expressively yet will not take part in classroom dialogue and do the minimum to scrape by in their written work? Once, many years ago (when I still gave exams), in an undergraduate class on second language acquisition I had a student who barely came to class at all yet turned up for the midterm exam and did tolerably well. What is one to do in such a situation? What was I to do? At one level, the student had done what she was supposed to: She had learned the material the course covered. At another level, she had flouted the (in this case unwritten) rules of engagement of the academy, which state that good students are expected to do the things that good students do: come to class, take part in discussions, show interest, and so on. A related issue is whether one aims to measure ability or achievement. Often I have had students who come to the class knowing very little about the matter at hand and who learn a lot during the class. Do these students deserve a better grade than those who knew a great deal more at the beginning yet at the end may still know more than their colleagues? The Morality of Testing and Assessment 68 Such questions raise the specter of our purpose in teaching in the first place. If indeed we aim merely to transmit information or knowledge, then we should reward the student who has, or has acquired, more information or knowledge. However, throughout this book I have been arguing that teaching cannot and should not be reduced to the transfer of information. It is primarily about the moral relation between teacher and student. This said, however, the teacher, as one-caring (Noddings, 1984), is in a different position than the student, the cared-for. What is the moral responsibility of the latter toward the former? Noddings (1984) suggested that while the teacher’s responsibility is greater, there is still a need for reciprocity (pp. 69–74). Yet to what extent is it our responsibility to judge the student on matters of character or innate ability? I argued in chapter 3 that there is an element of moral education in adult ESL settings. Yet what of other contexts? How far do our duties go beyond teaching the language and into the territory of character formation? An additional point is the moral dilemma that arises from the fact that students have different levels of ability. The questions I have just raised—whether students should be rewarded for effort or for achievement and whether progress is as important as final achievement—cannot really be answered without referring to differing levels of aptitude. Some students, for whatever reason, are simply good at languages; others, to use a Polish expression, are anti-talents when it comes to language learning. In a sense, this is a matter of “moral luck” (Statman, 1993): Some people are “born better” in one regard or another. At college, my friend Brett would regularly infuri-ate me by finishing his French essays in a scrawl as we were walking to class together; he invariably got an A. I think most of us have known other Bretts, whether as friends or students of ours. Should he and his kind be rewarded for simply being better and faster? It seems to me that try as we might to evaluate students on language alone, we cannot help but take other, morally charged circumstances into consideration; the question is, are we aware of this? If so, have we thought through the moral consequences of our decision?
Continue Reading

Assessing Knowledge of Language

A central question in the assessment of language learning—possibly the most important of all—is: What does it mean to know a language? Anyone designing any kind of evaluation has to answer this question; yet to do so is already to begin to make morally significant judgments. Is language knowing vocabulary? Being able to recite grammar rules? To buy an airplane ticket? To translate sentences? To write a persuasive essay? In choosing between these and a thousand other options, we are making choices that will have significant effects on our students and their performance. Consider the relatively simple case of Wen-Hsing mentioned earlier, where choices of what is and is not acceptable, reached by group consensus, left students who had given grammatically acceptable answers with a worse score. Furthermore, the fact of the matter is that our choices themselves are largely based on what I have called faith, that is, our beliefs about the nature of language, learning language and knowing language that are grounded only partly in logic and can never be fully confirmed or disproved (see chap. 1). Knowing a language is a phenomenally complicated thing; in determining how to test that knowledge we are forced to make choices that oversimplify the picture (McNamara, 1996). Our choices, furthermore, have demonstrable consequences for students. Ania, my elder daughter, who is bilingual in English and Polish, returned to Poland for some of her high school education. In one of her English classes she failed a major exam because she did not “know” the grammar of English and so was unable to understand instructions such as: “Convert the following sentences into the present perfect tense,” even though she was able to use such structures with nativelike ability in her speech. Her teachers had chosen to define knowledge of English as knowledge of grammatical terminology rather than the actual ability to speak the language (which for Ania would not have been a problem). Of The Morality of Testing and Assessment 66 course, we have some general guidelines—it is good pedagogical practice, for example, to test what has been covered in class and not what has not (Genesee & Upshur, 1996; Herman, Aschbacher, & Winters, 1992)—but this merely begs the question of what should be taught in class. The business of testing is even more complicated because there is only ever an indirect relation between our notion of what it is to know a language and the form of evaluation we devise. Even if we believe that language learning is a matter of vocabulary only, we have to select certain lexical items to be included in the test and exclude others. The situation is, of course, infinitely more complex if we have a more sophisticated understanding of knowledge of language, including areas such as pragmatics and discourse. In parallel fashion, there is only ever an indirect relationship between a student’s performance in a test and her actual knowledge of the language, whether for reasons of nerves or having a good day or bad day, or from the universally acknowledged slippage between competence and performance. All of these factors mean that to devise a test and to assign scores or grades to those who take it is to sail out onto very dark and deep moral waters indeed. Last, another fundamental conundrum is that neither language nor competence in language is naturally measurable. If we are judging how high a person can jump, we can pretty much agree on who jumps higher than others: Height is simple to measure. It is not at all clear, however, how we can objectively measure how well someone speaks another language. We find ourselves resorting to subjective terms such as fluent, hesitant, and difficulty (Richard-Amato, 1996, pp. 99–100), which require constant interpretation, and once more, the more sophisticated our attempts at measurement become, the harder they are to pull together into a cohesive overall assessment. The fundamental immeasurability of language competence lends a further moral dimension to our work in language assessment; the decisions we are forced to make about how competence will be assessed are always subjective and thus can only be rooted in our beliefs about what is right and good, beliefs which, we must always acknowledge, could be mistaken.
Continue Reading