One Hundred Years of Research on Grading

By Kim Marshall, TIE columnist

This piece is reprinted from The Marshall Memo, Kim Marshall’s weekly summary of current research and best practices in the field of education. Drawing on his experience as a teacher, principal, central office administrator, consultant, and writer, Kim Marshall lightens the load of busy educators by serving as their “designated reader.”
The article: “A Century of Grading Research: Meaning and Value in the Most Common Educational Measure” by Susan Brookhart, Thomas Guskey, Alex Bowers, James McMillan, Jeffrey Smith and Lisa Smith, Michael Stevens, and Megan Welsh in Review of Educational Research, December 2016 (Vol. 86, #4, p. 803-848),; Brookhart can be reached at
In this Review of Educational Research article, Susan Brookhart (Duquesne University), Thomas Guskey (University of Kentucky), Alex Bowers (Columbia University), James McMillan (Virginia Commonwealth University), Jeffrey Smith and Lisa Smith (University of Otago), and Michael Stevens and Megan Welsh (University of California/Davis) review a century of research on grading practices. Some key conclusions:
• Grades convey important information. Over the years, grades have been maligned by researchers and psychometricians as subjective and unreliable measures of student achievement. Actually, grades are useful indicators of things that matter to students, teachers, parents, schools, and communities, and they are more accurate predictors of high-school completion and transition to college than standardized test scores. In addition, when grades are aggregated from individual pieces of student work to report card or course grades and GPA, their reliability increases. For example, the reliability of overall college grade-point averages is estimated at .93.
• Grades are multidimensional. They often include noncognitive information that teachers value, including effort, motivation, improvement, work habits, attention, engagement, participation, and behavior. That’s probably why grades are more accurate than test scores at predicting downstream success; it’s now clear that noncognitive factors play an important role. (“Although noncognitive skills may help students develop cognitive skills,” say the authors, “the reverse is not true”). Teachers typically distinguish between noncognitive factors and academic ability on the one hand and other factors they believe should not be factors in grading: gender, socioeconomic status, and personality.
• Grades have a subjective element. Each teacher’s values come into play, including a desire to help all students be successful and wanting to be fair – i.e., the feeling that kids who worked hard shouldn’t fail, even if they haven’t learned. “Although measurement experts and professional developers may wish grades were unadulterated measures of what students have learned and are able to do,” say the authors, “strong evidence indicates that they are not.” Over the years, researchers have attributed variations in teachers’ grades to a number of factors: the rigor of the learning task; the actual quality of student work; the grading criteria; the grading scale; how strict or lenient the teacher was; and teacher error.
• Transparency is important. Problems arise when teachers aren’t clear with students, parents, and colleagues about what goes into grades. When that happens, grades can convey inaccurate and misleading information.
• Grading practices have improved. Early researchers found fault with teachers for giving different grades to the same piece of student work. But teachers in these studies were often flying blind; they weren’t given the grading criteria. Recent studies have shown that with clear rubrics and proper training, teachers can achieve an impressive level of inter-rater reliability.
• Grades are only the tip of the iceberg. What could explain why students who tried hard didn’t master the intended learning outcomes? There are several possibilities:
- The learning goals were developmentally inappropriate.
- Students lacked readiness or appropriate prior instruction to master the material.
- The teacher didn’t make clear what students were expected to learn.
- The curriculum materials weren’t appropriate.
- The teacher didn’t instruct students in appropriate ways, including using formative assessments to catch learning problems and help struggling students in real time.
In other words, say the authors, “Research focusing solely on grades typically misses antecedent causes. Future research should make these connections… Investigating grading in the larger context of instruction and assessment will help focus research on important sources and causes of invalid or unreliable grading decisions.”

