My Rubric Rant

I come here to bury rubrics, not to praise them, nor reform them. My colleagues in education are often surprised by my feelings on rubrics, more so than by my feelings on grades. Perhaps it's because the goals for rubrics are laudable. Who can argue with more transparency, clarity, and uniformity in assessment?

I join a long line of critics of rubrics, including Alfie Kohn and Jennifer Hurley. More can be found in a well-researched skeptical survey of critiques of rubrics by Panadero and Jonsson. Here I wish to focus on the ways in which rubrics have failed me, and how the key to a better approach was less, not more.

Pull Quote Tweet

I come here to bury rubrics, not to praise them, nor reform them. My colleagues in education are often surprised by my feelings on rubrics, more so than by my feelings on grades.

For a definition of rubrics, I like this one from ASCD:

A rubric is a scoring tool that lists the criteria for a piece of work, or “what counts” (for example, purpose, organization, details, voice, and mechanics are often what count in a piece of writing); it also articulates gradations of quality for each criterion, from excellent to poor.” [Goodrich, Understanding Rubrics]

Here's an example rubric for creative writing: 

Outcome54321
Students will write well organized, cohesive papers.Work functions well as a whole. Piece has a clear flow and a sense of purpose.Response has either a strong lead, developed body, or satisfying conclusion, but not all three.Uneven. Awkward or missing transitions. Weakly unified.Wanders. Repetitive. Inconclusive.Incoherent and fragmentary. Student didn't write enough to judge.
Students will use appropriate voice and tone in writing.Voice is confident and appropriate. Consistently engaging. Active, not passive voice. Natural. A strong sense of both authorship and audience.The speaker sounds as if he or she cares too little or too much about the topic. Or the voice fades in and out. Occasionally passive.Tone is okay. But the paper could have been written by anyone. Apathetic or artificial. Overly formal or informal."I just want to get this over with."Mechanical and cognitive problems so basic that tone doesn't even figure in. Student didn't write enough to judge.
Students will demonstrate original, creative writing.Excellent use of imagery; similes; vivid, detailed descriptions; figurative language; puns; wordplay; metaphor; irony. Surprises the reader with unusual associations, breaks conventions, thwarts expectations.Some startling images, a few stunning associative leaps with a weak conclusion or lesser, more ordinary images and comparisons. Inconsistent.Sentimental, predictable, or cliché.Borrows ideas or images from popular culture in an unreflective way.Cursory response. Obvious lack of motivation and/or poor understanding of the assignment.

[Loyola Marymount University, Creative Writing Example Rubric]

I like to use this rubric for discussion because it was meant as a model for others to use and nicely demonstrates the core elements in the ASCD definition:

  • The criteria are listed in the first column: organization, voice and tone, and creativity

  • The gradations in quality are articulated in the columns to the right, ordered from excellent to poor

  • The scores for each gradation in quality are given in the top row, from 5 to 1

So, what's the problem?  I've had to use rubrics for years, primarily in college entrepreneurship classes where a team of faculty are evaluating artifacts such as student proposals for startup businesses. We used rubrics as intended: to provide the faculty team and the students a shared structure for evaluation. But the results were always frustrating for us.

Let's start with scoring. Rubrics are meant to be a more transparent, fairer scoring system. Studies cited in Panadero and Jonsson's rebuttal to critiques of rubrics suggest students appreciate the way rubrics break a grade down along various dimensions. For faculty, though, the problems arise when making the judgment calls that those "gradations of quality" require. Is a piece of writing cliched (3 points) or does it borrow ideas in an unreflective way (2 points)?  How many instances of passive voice are allowed before a piece drops from active voice (5 points) to passive voice (4 points). The problem is that the columns in a rubric create sharp boundaries that don't exist for subjective terms like "strong sense of authorship" in creative writing, or "a clear market is defined" in entrepreneurship proposals.

The next problem arises when we total the scores for the various criteria to get a grade. Two problems occurred repeatedly in our entrepreneurship evaluations. First, most artifacts, despite wide variations along various criteria, ended up getting the same score. This is because summing averaged out the highs and lows in each criterion. The creative but disorganized proposal got the same grade as the polished but trite one. The second problem was even more frustrating: the proposals that the faculty liked best rarely got the highest scores. Sometimes the least liked proposal scored best. Why? Because our rubrics would include several criteria that were strongly correlated. These were typically the more easily defined criteria that focused on following some specified process. Such criteria correlate because students who follow one set of steps tend to follow them all. Alongside these process criteria would be one or two rows for creativity and insight. The correlated process criteria dominated the final total. Proposals that checked the process boxes out-scored the creative insightful ones that the faculty liked better. One way to fix this problem is to weight the criteria differently but then students either pay little attention to meeting the low-weighted criteria, or they complain about getting a low grade when they have 5 out of 5 on so many rows. 

Can we fix rubrics by eliminating the scores?  This is harder than it looks. It's not enough to remove numbers. I often see rubrics that replace the numbers with "excellent", "well done", "average", "needs work", and so on. Students are not fooled. They know that sooner or later their grade is going to be affected by how many times they get "well done" versus "needs work".

There is a more effective fix: the single-point rubric. This approach drops the scores and the gradations in quality. It keeps only the criteria and the one column that describes what good answers look like and adds a column or two for customized elaboration. An example I particularly like is this one:

You do not score with a single-point rubric. When evaluating a student artifact, you say whether it has met all the criteria or not. If not, you point out what needs revision. There is no need to weight criteria differently or make subjective decisions about whether something is three points or four. Either all criteria have been met well enough to move on, or not.

Pull Quote Tweet

We've jettisoned the rubric's scores, gradations of quality, and tabular format, and added copious amounts of feedback and opportunity for revision. Our stone soup is ready. Time to remove the stone.

I don't hate the single-point rubric. By eliminating scoring and introducing revision, it removes the most critical problems. Like stone soup though, it is not clear that the final product requires the star ingredient at all. If anything, the single table constrains our ability to give good feedback. When reviewing an essay, we can't just list the criteria that have not yet been met. The single-point rubric depends on the customized feedback to guide a student in finding and fixing the problems. When I give feedback on business plans, product ideas, code, and such, I prefer to mark up the document with questions and suggestions. I want the student to revisit and review the document, not look for feedback crammed into a single cell of an isolated table. 

Once we focus solely on feedback and revision, what about grading?  Most of us work in environments where at the very least a final grade is required, and students have a legitimate need to know what that grade might be along the way. For thoughts on that, see Twenty Years Gradeless.

To conclude this essay on stone soup à la rubrics. We've jettisoned the rubric's scores, gradations of quality, and tabular format, and added copious amounts of feedback and opportunity for revision. Our stone soup is ready. Time to remove the stone.


Christopher Riesbeck is an Associate Professor in Electrical Engineering and Computer Science at Northwestern University, with a courtesy appointment in the Learning Sciences. He has a PhD in Computer Science from Stanford University.

Previous
Previous

No Secrets Teacher Evaluation

Next
Next

Habit Stacking Feedback