Language Assessment: 2010

Jumat, 04 Juni 2010

TESTING, ASSESSING, and TEACHING

1.Definition of A Test
A Test is a method of measuring a person’s ability, knowledge, or performance in a given domain.

2.Assessment and Teaching
The difference between test and assessment:
•Tests are prepared administrative procedures that occur at identifiable times in a curriculum when learners muster all their faculties to offer peak performance, knowing that their responses are being measured and evaluated.
•Assessment is an ongoing process that encompasses a much wider domain. Teaching sets up the practice of language learning.

3.Informal and Formal Assessment
•Informal assessment can take a number of forms, starting with incidental, unplanned comments and responses, along with coaching and other impromptu feedback to the student.
•Formal assessments are exercises or procedures specifically designed to tap into a storehouse of skills and knowledge. They are systematic, planned sampling techniques constructed to give teacher and student an appraisal or student achievement.

4.Formative and Summative Assessment
•Formative assessment is evaluating students in the process of “forming” their competencies and skills with the goal of helping them to continue that growth process.
•Summative assessment aims to measure, or summarize, what a student has grasped, and typically occurs at the end of a course or unit of instruction.

5.Norm-Referenced and Criterion-Referenced Test.
•Norm-Referenced Test: each test-taker’s score is interpreted in relation to a mean (average score), median (middle score), standard deviation (extent of variance in scores), and/or percentile rank.
•Criterion-Referenced Tests: are designed to give test takers’ feedback, usually in the form of grades, on specific course or lesson objectives.

6.Discrete-Point and Integrative Testing
a.Discrete-Point Tests are constructed on the assumption that language can be broken down into its component parts and that those parts can be tested successfully. These components are the skills of listening, speaking, reading and writing and various units of language.

b.Integrative Test: Two types of tests claimed to be examples of integrative tests. They are cloze test and dictation.
•Cloze Test is a reading passage (150 to 300 words) in which roughly every sixth or seventh word has been deleted; the test taker is required to supply words that fit into those blanks.
•Dictation: learners listen to a passage of 100 to 150 words read aloud by an administrator (or audiotape) and write what they hear, using correct spelling.

7.Computer-Based Testing.
Computer-based testing offers these advantages:
•Classroom-based testing
•Self-directed testing on various aspects of a language
•Practice for upcoming high-stakes standardized tests.
•Some individualization
•Administered easily to thousands of test-takers.

Computer-based testing also has disadvantages:
•Lack of security
•Occasional “home-ground” quizzes that appear on unofficial websites may be mistaken for validated assessment.
•Potential for flawed item design (multiple –choice format)
•Open-ended responses are less likely to appear
•The human interactive element (especially in oral production) is absent

(Source: H. Douglas Brown, 2004)

ASSESSING SPEAKING

One of the productive skills in language is speaking. Speaking and listening are almost always closely interrelated. It is very difficult to isolate oral production tasks that do not directly involve the interaction of aural comprehension. Only in limited contexts of speaking people can assess oral language without the aural participation of an interlocutor. As a productive skill, speaking can be directly and empirically observed. The interaction of speaking and listening challenges the designer of an oral production test to tease apart, as much as possible, the factors accounted for by aural intake. Another challenge is the design of elicitation techniques. Because most speaking is the product of creative construction of linguistic strings, the speaker makes choices of lexicon, structure, and discourse.
In receptive performance, the elicitation stimulus can be structured to anticipate predetermined responses and only those responses. However, in productive performance, the oral or written stimulus must be specific enough to elicit output within an expected range of performance such that scoring or rating procedures apply appropriately. In speaking assessment, each score represents one of several traits such as: pronunciation, fluency, vocabulary use, grammar, comprehensibility, etc.
Speaking has five categories which are similar to listening. First, imitative, it is the ability to simply parrot back (imitate) a word or phrase or possibly a sentence. The only role of listening here is in the short-term storage of a prompt, just long enough to allow the speaker to retain the short stretch of language that must be imitated. Second, intensive, it is the production of short stretches of oral language designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships such as: prosodic elements-intonation, stress, rhythm, juncture. The speaker must be aware of semantic properties in order to be able to respond, but interaction with an interlocutor or test administrator is minimal at best. Third, responsive, it includes interaction and test comprehension but at the somewhat limited level of very short conversations, standard greetings and small talk, simple requests and comments, and the like. Fourth, interactive, sometimes includes multiple exchanges and/or multiple participants. Interaction can take the two forms of transactional language, which has the purpose of exchanging specific information, or interpersonal exchanges, which have the purpose of maintaining social relationships. In interpersonal exchanges, oral production can become pragmatically complex with the need to speak in a casual register and use colloquial language, ellipsis, slang, humor, and other sociolinguistic conventions. The last category, extensive, includes speeches, oral presentations, and story-telling, during which the opportunity foe oral interaction from listeners is either highly limited or ruled out altogether. Language style is frequently more deliberative and formal for extensive tasks, but people cannot rule out certain informal monologues.
A task type that is generally used in imitative speaking is word repetition. In a simple repetition task, test-takers repeat the stimulus, whether it is a pair of words, a sentence, or perhaps a question. Another popular test is PhonePass. It is a widely used and commercially available speaking test in many countries. It is remarkable that research on the PhonePass test has supported the construct validity of its repetition tasks not just for a test-taker’s phonological ability but also for discourse and overall oral production ability. Scores for the PhonePass test are calculated by a computerized scoring template and reported back to the test-taker within minutes. The PhonePass findings could signal an increase in the future use of repetition and read-aloud procedures for the assessment of oral production.
The next type is intensive speaking, in which test-takers are prompted to produce short stretches of discourse through which they demonstrate linguistic ability at a specified level of language. Tasks used in this level like directed response task, the test administrator elicits a particular grammatical form or a transformation of a sentence. The following, read-aloud tasks which include reading beyond the sentence level up to a paragraph or two. Because of the results, reading aloud may actually be a surprisingly strong indicator of overall oral production ability. Reading aloud is somewhat inauthentic in that people seldom read anything aloud to someone else in the real world, with the exception of a parent reading to a child, occasionally sharing a written story with someone, or giving a scripted oral presentation. Another technique for targeting intensive aspects of language requires test-takers to read dialogue in which one speaker’s lines have been omitted. One way which is popular to elicit oral language performance at both intensive and extensive levels is a picture-cued stimulus that requires a description from the test-taker. Pictures may be very simple, designed to elicit a word or a phrase; somewhat more elaborate and “busy”; or composed of a series that tells a story or incident. Scoring responses on picture-cued intensive speaking tasks varies, depending on the expected performance criteria.
Responsive speaking, as the third type, involves brief interactions with an interlocutor, differs from intensive tasks in the increased creativity given to the test-takers and from interactive tasks by the somewhat limited length of utterances. Various tasks in this level are like question and answer consisting of one or two questions from an interviewer or it can make up a portion of a whole battery of questions and prompts in an oral interview, giving instructions and directions, paraphrasing (read or hear a limited number of sentences and produce a paraphrase of the sentence), and Test of Spoken English (TSE). The tasks on the TSE are designed to elicit oral production in various discourse categories rather than in selected phonological, grammatical, or lexical targets.
The fourth category of oral production assessment is interactive speaking. It includes tasks that involve relatively long stretches of interactive discourse (interviews, role plays, discussions, games) and tasks of equally long duration but that involve less interaction (speeches, telling longer stories, and extended explanations and translations). Interviews can vary in length from perhaps five to forty-five minutes depending on their purpose and context, while role playing is a popular pedagogical activity in communicative language-teaching classes. Role play opens some windows of opportunity for test-takers to use discourse that might otherwise be difficult to elicit. The test administrator must determine the assessment objectives of the role play, then devise a scoring technique that appropriately pinpoints those objectives. Discussions, as formal assessment devices, are difficult to specify and even more difficult to score. They offer a level of authenticity and spontaneity that other assessment techniques may not provide. Discussion scoring should be carefully designed to suit the objectives of the observed discussion. Interactive tasks tend to be what some would describe as interpersonal.
The latest one, extensive speaking tasks involve complex, relatively stretches of discourse. They are frequently variations on monologues, usually with minimal verbal interaction. Tasks used in this level are like: oral presentations, picture-cued story-telling, retelling a story and news event, and translation (of extended prose). For oral presentations, a checklist or grid is a common means of scoring or evaluation; while in retelling a story, test-takers hear or read a story that they are asked to retell. In translation, longer texts are presented for the test-taker to read in the native language and then translate into English.
There are many various skills in speaking that can be categorized into two, microskills and macroskills. The former refer to producing the smaller chunks of language such as phonemes, morphemes, words, collocations, and phrasal units. In the other hand, the latter imply the speaker’s focus on the larger elements like fluency, discourse, function, style cohesion, nonverbal communication, and strategic options. The microskills components are: produce differences among English phonemes and allophonic variants; produce chunks of language of different lengths; produce English stress patterns, word in stressed and unstressed positions, rhythmic structure, and intonation contours; produce reduced forms of words and phrases; use an adequate number of lexical units to accomplish pragmatic purposes; produce fluent speech at different rates of delivery; monitor one’s own oral production and use various strategic devices; use grammatical word classes, systems, word order, patterns, rules and elliptical forms; produce speech in natural constituents in appropriate phrases, pause groups, breath groups, and sentence constituents; express a particular meaning in different grammatical forms; use cohesive devices in spoken discourse. The other one, macroskills components are: appropriate accomplish communicative functions according to situations, participants, and goals; use appropriate styles, registers, implicature, redundancies, pragmatic conventions, conversation rules, floor-keeping and yielding, interrupting, and other sociolinguistic features in face-to-face conversations; convey links and connections between events and communicative such relations as focal and peripheral ideas, events and feelings, new information and given information, generalization and exemplification; convey facial features, kinesics, body language, and other nonverbal cues along with verbal language; develop and use battery of speaking strategies.
Three important issues which should be considered by test writer in designing tasks, namely: no speaking task is capable of isolating the single skill of oral production; eliciting the specific criterion people have designated for a task can be tricky because beyond the word level, spoken language offers a number of productive options to test-takers; because of the above two characteristics of oral production assessment, it is important to carefully specify scoring procedures for a response so that ultimately you achieve as high a reliability index as possible.

(Source: H. Douglas Brown, 2004)

ASSESSING LISTENING

In language learning, firstly people must know that actually there are four basic skills: listening, speaking, reading, and writing. Teachers certainly rely on their underlying competence in order to accomplish these performances. When teachers propose to assess someone’s ability in one or a combination of the four skills, teachers assess that person’s competence, but teachers observe the person’s performance. Sometimes the performance does not indicate true competence. A bad night’s rest, illness an emotional distraction, test anxiety, a memory block, or other student-related reliability factors could affect performance, thereby providing unreliable measure of actual competence. So, the first important principle for assessing a learner’s competence is to consider fallibility of the results of a single performance. The second is teachers must rely as much as possible on observable performance in our assessments of students. Observable here means being able to see or hear the performance of the learner (the sense of touch, taste, and smell don’t apply very often to language testing).
Isn’t it interesting that in the case of receptive skills, teachers can observe neither the process of performing nor a product? Teachers are not observing the listening performance; they’re observing the result of the listening. They can no more observe listening than they can see the wind blowing. The process of the listening performance itself is the invisible, inaudible process of internalizing meaning from the auditory signals being transmitted to the ear and brain. Probably people will argue that the product of listening is a spoken or written response from the student that indicates correct or incorrect auditory processing. The receptive skills are clearly the more enigmatic of the two modes of performance. People cannot observe the actual act of listening or reading, nor can they see or hear an actual product. They can observe learners only while they are listening or reading. So, all assessment of receptive performance must be made by inference.
In reality, listening really has an important role in language skills. However, it has often played second fiddle to its counterpart, speaking. It is rare to find just a listening test because listening is often implied as a component of speaking. Moreover, the overtly observable nature of speaking renders it more empirically measurable then listening. A good speaker is often (unwisely) valued more highly than a good listener. A teacher of language know that one’s oral production ability-other than monologues, speeches, reading aloud, and the like-is only as good as one’s listening comprehension ability. Even further impact is the likelihood that input in the aural-oral mode accounts for a large proportion of successful language acquisition. Whether I n the workplace, educational, or home contents, aural comprehension far outstrips oral production in quantifiable terms of time, number of words, effort, and attention.
To design appropriate assessment tasks in listening begins with the specification of objectives, or criteria. Those objectives may be classified in terms of several types of listening performance. When people listen, they can (1) recognize speech sounds and hold a temporary “imprint” of them in short-term memory, (2) determine the type of speech event (monologue, interpersonal dialogue, transactional dialogue) that is being processed and attend to its context and the content of the message, (3) use (bottom-up) linguistic decoding skills and/or (top-down) background schemata to bring a plausible interpretation to the message, and assign a literal and intended meaning to the utterance, (4) delete the exact linguistic form in which the message was originally received in favor of conceptually retaining important or relevant information in long-term memory. In another word, each of these stages represents a potential assessment objective like: comprehending of surface structure elements, understanding of pragmatic context, determining meaning of auditory input, and developing the gist, a global or comprehensive understanding. From the main activities above, people can derive four common types of listening performance, namely: intensive, responsive, selective and extensive. Intensive is listening for perception of the components (phonemes, words, intonation, discourse markers, etc.) of a larger stretch of language, while responsive is listening to a relatively short stretch of language in order to make an equally short response. The third, selective is processing stretches of discourse such as short monologues for several minutes in order to “scan“ for certain information. Its purpose is not necessarily to look for global or general meanings, but to be able to comprehend designated information in a context of longer stretches of spoken language. The fourth, extensive then is listening for developing a top-down, global understanding of spoken language. Listening that includes all four the above types as test-takers actively participate in discussions, debates, conversations, role plays, and pair of group work. Their listening performance must be intricately integrated with speaking in the authentic give-and-take of communicative interchange.
In line with the performance of listening comprehension, Richards (1983) divided it into two parts; they are microskills (attending to the smaller bits and chunks of language, in more of a bottom-up process). and macroskills (focusing on the larger elements involved in a top-down approach to a listening task). More detail the microskills include: discriminate among the distinctive sounds of English; retain chunks of language of different lengths in short-term memory; recognize English stress patterns, words in stressed and unstressed positions, rhythmic structure, intonation contours, and their role in signaling information; recognize reduced forms of words, distinguish word boundaries, recognize a core of words, and interpret word order patterns and their significance; process speech at different rates of delivery; process speech containing pauses, errors, corrections, and other performance variables; recognize grammatical word classes, systems, patterns, rules, and elliptical forms; detect sentence constituents and distinguish between major and minor constituents; recognize that a particular meaning may be expressed indifferent grammatical forms; recognize cohesive devices in spoken discourse. Then, macroskills include: recognize the communicative functions of utterances, according to situations, participants, goals; infer situations, participants, goals using real-world knowledge; from events, ideas, and so on, described, predict outcomes, infer links and connections between events, deduce causes and effects, and detect such relations as main idea, supporting idea, new information, given information, generalization, and exemplification; distinguish between literal and implied meanings; use facial, kinestic, body language, and other nonverbal clues to decipher meanings; develop and use a battery of listening strategies, such as detecting key words, guessing the meaning of words from context, appealing for help, and signaling comprehension or lack thereof.
Implied in the taxonomy above is a notion of what makes many aspects of listening difficult, or why listening is not simply a linear process of recording strings of language as they are transmitted into our brain. Some factors that make listening difficult are like: clustering (attending to appropriate “chunks” of language –phrases, clauses, constituents), redundancy (recognizing the kinds of repetitions, rephrasing, elaborations, and insertions that unrehearsed spoken language often contains, and benefiting from that recognition), reduced forms (understanding the reduced forms that may not have been a part of an English learner’s past learning experiences in classes where only formal “textbook” language has been presented), performance variables (being able to “weed out” hesitations, false starts, pauses, and corrections in natural speech), colloquial language (comprehending idioms, slang, reduced forms, shared cultural knowledge), rate of delivery (keeping up with the speed of delivery, processing automatically as the speaker continues), stress, rhythm and intonation (correctly understanding prosodic elements of spoken language), and interaction (managing the interactive flow of language from listening to speaking to listening, etc.).
Language assessment field ideally has a stockpile of listening test types which are cognitively demanding, communicative, and authentic, not to mention interactive by means of an integration with speaking. Buck (2001: 92) stated that every test requires some components of communicative language ability, and no test covers them all. Similarly, with the notion of authenticity, every task shares some characteristics with target language tasks, and no test is completely authentic. There are some possibilities for getting authentic listening tasks. First, note-taking of classroom lectures by professor in which they are common features of a non-native English-user’s experience. Scoring system of note-taking covers visual representation, accuracy, symbol and abbreviation. Second, editing that provides both a written and spoken stimulus and requires the test-taker to listen for discrepancies. The scoring will be test-takers read (the written stimulus material), test-takers hear (a spoken version of the stimulus that deviates, in a finite number of facts or opinions from the original written form), and test-takers mark (the written stimulus by circling any words, phrases, facts, or opinions that show a discrepancy between the two versions). Third, interpretive tasks extending the stimulus material to a longer stretch of discourse and forces the test-taker to infer a response. Potential stimuli used in interpretive tasks include song lyrics, recited poetry, radio/television news reports, and an oral account of an experience. Fourth, retelling a story or news listened before by the test-takers.

(Source: H. Douglas Brown, 2004)

Language Assessment

Pengikut

Arsip Blog

Mengenai Saya

Jumat, 04 Juni 2010

TESTING, ASSESSING, and TEACHING

ASSESSING SPEAKING

ASSESSING LISTENING