my-efl-blog - Tumblr blog

my-efl-blog · 1 year ago

Text

Assessment Terminology

There are many, many words used in discussing assessment in education. Here is a non-exhaustive list of them, with definitions to contextualize them and clarify their meanings. The other glossary is full of other TESOL terms, so if you encounter a term here that you don’t recognize, check there.

General Terms:

The Purpose of The Assessment:

Diagnostic – A diagnostic is an assessment given at the beginning of a course or term that gives specific information about a learner’s abilities or knowledge, which allows an educator to focus on enhancing the student’s strengths and supporting the student’s weaknesses. For example, a diagnostic lets an educator know that a student may be good at the past tense but has difficulty with academic vocabulary.

Formative – Formative assessment is assessment done during the program that influences the program moving forward. Formative assessments during a semester allow an educator to know what the students have mastered, what they are still learning, and what they have not yet learned, which allows the teacher to adapt their lessons to address what the students need more help with. For example, formative assessments about the past tense may highlight which irregular forms should be emphasized in the next unit.

Summative – Summative assessments are conducted at the end of the term, and they are used to show what the student has learned by the end of the program. Final exams are summative assessments.

Performance assessment – a performance assessment asks a student to complete a task of some sort, and their ability to complete that task demonstrates their abilities. Presentations and group projects are examples of performance assessments. These are generally evaluated with a rubric or other rating scale.

Knowledge test – these assessments measure what information a student knows. This information can be facts, models, or processes, usually as learned in content classes.

Skill test – these assessments measure a student’s ability to perform a skill. Second Language acquisition often refers to the Four Skills – Reading, Listening, Speaking, and Writing, so a test of those skills would be a skill test, but an assessment of a specific subskill, like a roleplay that assesses a student’s ability to politely refuse an invitation in the target language, would also be a skill test.

The Contents of the Assessment:

Curriculum-Related Tests:

Admission test – the results of these tests are used to decide if a student is allowed to enroll in a program.

Placement test – these tests are used to determine what course or level a student will be placed into in a program.

Progress test – these tests show how far the student has progressed – how much their knowledge or skills have grown – during their time in the program.

Achievement test – these tests show what the student has accomplished by the end of their time in the program.

Non-Curricular Tests:

Screening test – these tests determine if a student in the US K-12 system is an ELL and if they should be in an ELD program.

Proficiency test – these tests determine how much of the L2 a person understands and can produce. Most people’s levels vary by skill (reading, listening, writing, speaking, grammar, vocabulary), and many people will have small daily variances due to their mood, health, etc. However, a proficiency test should in general explain what the language user can do and what they cannot yet do in the L2.

The Source of the Test:

Classroom-based – assessments in the classroom based on the content, curriculum, and practices of that specific classroom. Not shared between classes or teachers.

Common – assessments used by all teachers in a specific level or program. If all of the instructors of the class use the same assessments, these are common assessments.

Large-scale – these assessments are used in many contexts beyond a single school or course/program. National exams and college entrance exams like the SAT are large-scale.

Teacher-made – these assessments are made by the instructor(s) of a course, usually for that specific class.

Published – these assessments are made by groups or corporations for use by many instructors in many different contexts. Tests that come with a textbook are published.

Standardized –these are large scale tests which are given to test-takers in settings that are as identical as possible to each other. All test-takers receive the same questions and the tests are proctored and invigilated in the same ways. The IELTS and the ACT tests are standardized tests.

Other Important General Terms:

Formal – formal assessments are planned. Tests and homework assignments are formal.

Informal – informal assessments are incidental and unplanned. An impromptu conversation that allows an instructor to gauge the speaking proficiency of a student is an informal assessment.

Low Stakes – these assessments have little to no repercussions or consequences for students. No major decisions are based on low-stakes assessments. An in-class speaking activity or a class exit slip are examples of low-stakes assessments.

Medium Stakes – these assessments may have some consequences for students. Graded assessments, portfolio pieces, or unit-tests are medium-stakes.

High Stakes – these assessments have serious consequences for students. Exit assessments for an ELD program, college entrance exams, and proficiency tests used for employment purposes are high-stakes assessments.

Criterion-referenced – the scores on criterion-referenced assessments are based on lists of criteria, such as Student Learning Outcomes or curriculum standards. Math tests, in which correct processes and answers lead to high scores, are criterion-referenced.

Norm-referenced – the evaluation of a norm-referenced assessment compares the students in a given population. Students’ work is assessed against the work of other students. For example, a norm-referenced college entrance exam would allow a university to only admit the top students in the testing pool but would not give information about how one group of students compares to another group that completed a different test. A norm referenced test would also allow a program to evenly split students into levels based on relative proficiency, which would be useful if the program had ideal class sizes but a flexible curriculum.

Assessment – assessment encompasses any of the work done to gather and analyze data from students in order to know about and report about their learning. Interviews, surveys, tests, journals, and roleplays are examples of typical language proficiency assessments.

Measurement – this is the work of assigning qualitative or quantitative information to observable phenomena, in order to report about and analyze said phenomena. For example, L2 proficiency is an observable phenomenon, and we measure it by describing standards and then assessing language-users’ attainment of those standards.

Test – a test is a systematic, planned way to gather data about a student and their skills, knowledge, or proficiency. Testing is a very common assessment procedure.

Evaluation – evaluation is the step beyond assessment; evaluation takes the result of assessment and uses it to judge the outcomes of a program. End of Course exams can be used to assess students’ learning, and the results of EOCs can be used to evaluate the effectiveness of teaching methodologies used in the course.

Universal Design – this is a philosophy that is followed in many or most fields that create products. Universal Design promotes designs that take into account any accommodations that may be necessary in order to make a product usable for anyone and incorporates those changes into the original product instead of requiring users to supplement or modify the end product. For example, if a can opener isn’t usable by people with low grip strength, then universal design would modify the product so that it requires less mechanical advantage for all users, rather than requiring that people with low grip strength use after-market modifications in order to use the can opener. For assessment and materials design, universal design principals include incorporating scaffolding activities, glossaries, and models for all students, not just for ELLs.

Terms about Scoring:

Discrete-point test – each task or test item measures a different skill or piece of knowledge. For example, a task on a speaking test that requires saying a single word in order to assess pronunciation of a single phoneme.

Integrated test – tasks or test items may measure many skills simultaneously. An item on a speaking test that requires a long turn and assesses pronunciation, functional language, vocabulary, and grammar simultaneously is an integrated test.

Selected response – these tasks give students both the “question” and the “answer”, and students need to recognize the correct answer for the questions. Examples include multiple-choice, matching, true-false, and ordering tasks.

Limited production – these tasks provide the question that require students to supply a small amount of information (small could be as little as one letter or morpheme, or as much as a few sentences). Examples include fill-in-the-blank, graphic organizers, sentence combining, and short answer tasks.

Deletion-based – these tasks delete information from a text and ask students to accurately or appropriately replace that information. For example, a cloze task deletes words in a text and students must recognize metalinguistic information such as part of speech, plurality, and register in order to reconstruct what the missing word might be.

Translation – translation requires knowledge of grammar, vocabulary, register, discourse, and pragmatics in both languages in order to accurately and appropriately convey the same meaning. Translating to the L2 typically demonstrates productive proficiency, whereas translating to the L1 typically demonstrates receptive proficiency.

Extended production – these tasks ask students to generate a large amount of language, which can assess higher order language concerns such as discourse-level organizational language (like signposting and transitions) or genre-appropriate conventions (such as using past tenses to set up a scenario and present tenses to describe the action when narrating an anecdote). Other examples of extended production tasks include recall tasks, summaries, dictation, spoken responses, monologues, role plays, and oral interviews.

Rubric types:

Holistic rubric – a holistic rubric is one in which results are given on a single scale – rubrics that simply label a product as “needs improvement/satisfactory/excellent” are holistic. These are less labor intensive for a teacher and clearer to understand for young learners, but they don’t provide a lot of specific feedback for older or more advanced learners.

Analytic rubric – an analytic rubric breaks the assessment down into components and then rates each of those components separately. These rubrics may assess components such as discourse, grammar, vocabulary, function, content, or participation; a student who produces a grammatically correct essay that doesn’t answer the prompt could get a high score for grammatical accuracy and a low score for content, allowing them to understand where their work can improve. Analytic rubrics that are not task-specific can be used for a variety of assessments, which reduces teacher workload and improves student understanding of the rubric.

Task-specific rubric – these rubrics allocate points for the different criteria of a task. For example, building a volcano in science class may be assessed on a rubric that includes creating the proper shape of a volcano or explaining the chemical reaction in the presentation. The rubric can by highly specific and act as a checklist for the students, but it can only be used for a single task.

The 6 Principles of Assessment are ways to analyze the efficacy of assessments.

Validity – does the assessment measure what it purports to measure? For example, a grammar test that requires specific content or vocabulary knowledge may not be a valid assessment of grammar.

Reliability – if the assessment is repeated, are the results consistent? For example, if two graders would give wildly different scores to the same product, the assessment lacks inter-rater reliability.

Practicality – are the required resources feasible for the number of and frequency of assessments given? For example, a 1-hour interview may be a very accurate proficiency measure, but if a program has one interviewer for 2000 students who are assessed every month, the assessment isn’t practical.

Equivalency – does the assessment match the curriculum? For example, if students only do multiple choice worksheets in class and the assessment is an essay, the assessment isn’t equivalent.

Washback – do task types on the assessment change teaching practices? Examples of negative washback include increasing multiple-choice worksheets and decreasing group projects just because an EOC is multiple choice; examples of positive washback include increasing focus on academic language because summative assessments includes an academic presentation.

Fairness – is the assessment equitable for all students? For example, an assessment that requires knowledge of pop culture may not be fair to a student who doesn’t consume popular media.

US K-12 School Terms:

US – United States of America.

K-12 – Kindergarten to 12th grade. These are the years that the majority of Americans attend school, with kindergarteners typically 5-6 years old and 12th-graders typically 17-18 years old. Students may go to school before kindergarten (pre-K, preschool, or Head Start, for example), and depending on the state, students may be able to drop out at 16. Education in the US is compulsory; public schooling is free; and home schooling or private schooling are widespread but much less common forms of schooling.

Content Classes – content classes are the traditional school subjects like math, science, or language arts.

Annual Subject Testing – these are the tests that all students take in elementary and middle school. Many states have state-wide tests on subjects like math, science, and language arts. These tests are often NOT part of students’ grades – you don’t fail 6th grade science due to your state-wide annual science test score.

End Of Course exams – these are tests taken by high schoolers at the end of the semester or school year, and they ARE part of the students’ grades. For example, 25% of your Algebra I grade may come from your score on the state-wide EOC.

EL or ELL or EB – English Learner or English Language Learner or Emergent Bilingual. These students are in the process of learning English whose proficiency is in the range that they qualify for ELD programs.

ELD programs – English Language Development programs. These are programs with the goal of increasing academic English language proficiency in ELLs, with the goal of them learning subject content in US schools. In non-Bilingual contexts, students in the US study in English, so ELD programs intend to give ELLs the level of English proficiency necessary to learn the required curriculum for their current grade-level.

LUS – Language Use Survey, sometimes HLS Home Language Survey. This is a survey that intends to identify ELLs. It includes questions about parents’ and students’ language use, in order to know if a student needs to be screened for English proficiency or if they qualify for ELD services.

LIEP – Language Instruction Educational Program. The program(s) that a school district provides for ELLs. In the US, a school district may provide service to a city or a county. They are controlled to some extent by the state government, which is controlled to some extent by the federal government. They control to some extent each of the schools and programs in their district. ELD programs are generally controlled at the district level, but each school may have its own ELD programs and policies – some schools in a district may have dual immersion programs, others may have dedicated ESL classes, others may have ESL Specialist co-teachers. The benefit of a system like this is the flexibility to provide the most appropriate services for the local population – a school where Spanish-L1 students make up 50% of the population has different needs than a school where 7% of the students are ELLS but they all have different L1s. The drawback to a system like this is a lack of uniformity for migrant students and difficulty in evaluating all of the different programs.

Migrant – Migrant students are students whose families’ work jobs that require them to move often during the school year; often these are workers in the agricultural sector. These students will move from school to school and therefore will be identified and placed into ELL programs, if they are not L1 speakers, over and over again. Plans for sharing information between school systems are important, and differences in LIEPs and ELD programs in different areas can make these transitions difficult.

BICS – Basic Interpersonal Communication Skills. These are the language skills needed to hold a conversation, go shopping, use transportation, etc.

CALP – Cognitive Academic Language Proficiency. These are more complex, academic language skills that students need to acquire in order to be successful in content classes.

Are there more terms than this? Of course! But this is a list of terms to get you started.

#mateasseportfolio

0 notes