Lessons learned from my first semester of Grading for Equity

This fall, I made my first experiments with Grading for Equity, an approach intended to address inequities in traditional points-based grading schemes. Alongside, I adopted the supporting practice of specifications grading. Soon it became clear I was not one instructor acting alone, but part of a movement among CS educators responding to the uncertain conditions of the COVID-19 pandemic.

I’ve been asked to blog on my lessons learned, so I will. But first, to warm up, I’ll tell the story of how I got started.


The story

I’m a regular reader of the SIGCSE-members email list, an active conversation among members of the ACM Special Interest Group on Computer Science Education. Last June, responding to the Black Lives Matter movement, my friend Cory Bart started a conversation with a plea for advice on supporting Black students in CS1. As I was also searching for ways to cope with the uncertainty of the fall semester, a response from J. Phillip East at UNI caught my eye:

I strongly endorse “Grading for Equity”.

I believe, however, that it goes way beyond equity and gets to the heart of learning for all our students (which I guess is the heart of equity).  Implementing the ideas starts with identifying desired “outcomes” to be achieved (not content to be covered), building equitable assessments, and using the assessments equitably.  It also means _not_ including attendance, participation, late penalties, group work (mostly), homework (mostly), etc. in grades since they can be inequitably applied and/or do not directly relate to desired learning.

Several of us at UNI have been using some of the ideas for a while and are now explicitly working to totally make our grading equitable.  A colleague and I submitted a position paper on it to SIGCSE but unfortunately it was not accepted.  I’d be happy to share/discuss ideas with anyone who is interested.

I responded:

Greetings from a former Iowan! Would you be willing to share your unpublished position paper on grading for equity? A math colleague [Albert Schueller] and I were already talking about “mastery-based grading” as an approach for managing the likely chaos of this fall with the worsening COVID-19 pandemic.

Phillip shared his unpublished position paper with me, and I was inspired. You can ask him for it too. It’s unfortunate that it wasn’t accepted, as I think it may gain historical significance as the first work on the approach in the CS education community, and I hope he has resubmitted it.

I decided I wanted to read Joe Feldman’s 2019 book Grading for Equity . (For those reading along, Chapter 1 is available for free online.) After reading a bit, I started taking notes on my brand new iPad, starting as follows:

Friends, students, colleagues, lend me your ears. I come to bury grades, not to praise them. The evil that grades do lives after them.

Why do I say grades are evil? Traditional grading schemes undermine trust (p. 29)! And I wrote three weeks ago that trust is what teaching is all about.

I’ve learned to use “hacks” (p. 51) to overcome this, but they address the symptoms and not the root cause. We need a radical new approach.

Grades should be (p. 66):
accurate,
bias-resistant,
motivational!

I included a photograph of page 72, which provides a summary of grading practices mapping to these three pillars.

What made this book a priority was that colleague John Stratton and I agreed to read it together. John told me he was inspired to completely rewrite his teaching statement, and we were both inspired to revise our policies for fall classes. John also emailed Albert and a group other Whitman science colleagues interested in mastery-based grading to recommend the book. John wrote:

[We] found that it made some powerful arguments about how mastery grading is equitable grading.  Here are some of the big points that struck me.

We use grades as behavior modification tools, penalizing late assignments to teach punctuality, grading attendance to incentivize engagement, or grading formative assessments to incentivize practice.  In the end, this means that our grades significantly reflect whether someone meets our potentially biased and inequitable proscriptions of behavior, even if they do manage to learn what we wanted, but late or in spite of not having the time available to do all of the practice we recommended.

The assumption that students won’t do anything unless we put points on it teaches students that success in a class is about accumulating points.

When the focus is put on the external motivation of point-collecting, on deadlines, then students are strongly incentivized to copy or cheat to get the points. We hope that students will learn to meet this external motivator of point-collecting, but that saps the internal motivation that students need to be successful at creative problem solving.

At this point, I was committed to Grading for Equity, and had a good idea what I wanted to do for CS/Math 220, Discrete Mathematics & Functional Programming (though I was still at sea regarding CS 267, Human-Computer Interaction; more on this later).

It was helpful to discuss the approach with colleagues at Whitman. And then John and I both attended the Math-in-CS virtual workshop on Thursday July 30, organized by Peter-Michael Osera (who replaced me at Grinnell) and others. I ended up co-facilitating an “unconference” session on “Grading philosophy and alternatives,” which mostly focused on Grading for Equity. After this session, Peter-Michael recommended Robert Talbot’s blog post on specifications grading, which helped me fill in the gaps in my approach.

I started writing the syllabus for CS/Math 220, and decided that if I was all in on Grading for Equity in that course, I might as well try it in CS 267 as well.


The implementation

In CS/Math 220, I closely followed Talbot’s approach. I set about 35 Learning Targets, each with a corresponding problem from the textbook. These were grouped into “bundles” for achieving a D, C, or B. To earn an A, students also had to pass the “hurdle” of completing a certain number of Challenge Problems. To get over the problem Talbot reported – where many students did not even attempt Challenge Problems – I required a small number to earn a C or B.

The final version of the grading scheme was a simplification of my first scheme, in which the criteria for an A would have included the a scaffolded programming project and some particular, more challenging proofs. After some students made their project pre-proposals, I realized none of us had the capacity for a project on top of the weekly work and revisions. I was also somewhat disappointed to realize we weren’t going to progress far enough through the material to reach those particular problems.

Each week, I assigned ungraded Practice Problems to introduce new material, as well as Learning Targets and Challenge Problems on the previous week’s material. . I graded Challenge Problems on Talbot’s EMRN rubric, and Learning Targets as Pass/Redo. I set soft deadlines for all assignments, with no penalty for late workI accepted revisions of all work up to the end of the semester.There were no quizzes or exams.

In CS 267, I took a more holistic approach. I was easily able to come up with 24 learning outcomes for the semester. Unlike the CS/Math 220 Learning Targets, for the most part these did not map 1:1 onto assignments. It was also less clear what would constitute “mastery,” as professionals can spend their whole career mastering design skills and concepts; it was helpful to think about “competence” or “familiarity” instead. I used those newly formulated learning outcomes in rubrics to assess existing assignments (danger, Will Robinson!). Since Grading for Equity insists that learning is the responsibility of the individual, I developed an individual reflection for each team design assignment, and assessed the reflection rather than the team product.


Now what you’ve been waiting for: the lessons learned

Benefits:

  1. Growth mindset. It was so refreshing to give students feedback on how to improve their work, rather than telling them everything they did wrong to justify their score. I saw some students’ work improve tremendously over the course of the semester. since lectures are more comfortable. The revisions are the point – that’s how you learn!
  2. Grading is less of a chore. It’s much easier to give constructive feedback knowing students will have the opportunity to revise their work, and I’d rather do that than parse out exactly how many points a student earned. Amy Csizmar-Dalal wrote more about this in her blog post “5 Lessons from Fall Term” (see “Lesson 3: Speficiations grading helped…a lot”).
  3. No high-stakes assessments. No stress about writing exams, what’s covered or left out, what to do if a problem doesn’t work (since students can revise, I can too!)

Challenges:

  1. Unfamiliarity. I don’t know whether students appreciated the “growth mindset” approach as much as I did. One wrote something like this on the end-of-course evaluation survey: “The teaching was terrible. I never would have passed if not for the opportunity to revise my work.” I’m reminded of student protests to active learning because it is not as comfortable as being lectured to, even though students learn more when they are not just passive recipients. Obviously, I have more work to do making the value of the approach apparent to students.
  2. LMS gradebook friction. This might seem like a minor technical point, but it was a big source of stress and extra work early in the semester. I had to figure out how to overcome Canvas’s points-driven assumptions, eventually landing on a dual grading process in which I marked 0-point assignments as Complete/Incomplete for the standard gradebook (and to clear the grading off my queue) and used a rubric to record performance on learning outcomes. It was a pain and a cause of errors, and it was confusing to students. For next time, I should look more carefully at Talbot’s “flowchart” and “scorecard,” and I’ll also think about whether Amy’s approach plays well with Canvas.
  3. Anxiety about student progressAround the time that mid-semester grade reports were do, Albert and I had a conversation about how worried we were about some of our students who had submitted very little work. After receiving low-grade notices, some responded that they were catching up, or even that they were saving work up to submit in batches (!). All but a few made adequate progress by the end of the semester…albeit on their own schedule. Albert and I agreed that next time, we would firm up deadlines for initial submissions, with some flexibility but not too much.
  4. Too little practice. Although revision was really beneficial, I worried that students who needed more practice on different problems didn’t get enough. Next time I might assign some optional “extra” practice problems, and more explicitly allow for a small grade bump based on consistent participation and completion of practice problems, as Talbot suggests. I would also require that students complete relevant Learning Targets before attempting Challenge Problems, to make sure their effort and mine are not wasted.
  5. Time fragmentation. Even with revisions, the overall amount of grading was manageable thanks to specifications grading. But Canvas puts all submissions (and resubmissions) in need of grading on a helpful “to do” list, and I let my work be driven by that list. With late submissions, revisions, and too many options for Challenge Problems, one grading session might include single resubmissions of a dozen different problems, five or six submissions on each of three recent Learning Targets, and one or two submissions on twice that many recent Challenge Problems.
  6. Subjectivity and bias. There’s always some grey area in deciding what meets (or exceeds) expectations. At times, I found myself torn between generosity (“good enough!”) and toughness (“they can always revise!”) and I know I was not consistent in my leaning. During the semester, I discovered it was helpful to grade initial submissions anonymously. (Note to self: next time tell students I’ll be doing this so they don’t put their names on their papers!) It would also be helpful to write more detailed rubrics for learning outcomes and/or particular assignments. But even with those changes, there is space for bias to emerge when struggling students are permitted to demonstrate learning in an alternative format such as an oral exam, as Feldman recommends. I’m not sure what to do about that.
  7. Team projects. In the interest of grades accurately reflecting individual students’ learning, Feldman discourages summative assessment of group work. He recommends instead that instructors “assess each student individually to determine whether each learned the content or skills the group work was designed to teach” (p. 106). While this seems right, it’s easier said than done! In CS 267, my assessments of individual reflections seemed less rigorous than my assessment of the team product would have been. Or perhaps my rubric-based assessment of the team product was more objective but not more rigorous; I’m not sure. One approach would be to assign a small percentage of the final grade to team product assessments, but this re-introduces the freeloader problem and increases the overall amount of grading relative to grading only individual reflections. Another approach would be to develop more rigorous rubrics for assessing individual reflections. This problem is going to arise again this spring in both my classes (CS 301, ST: Computer Networks, and CS 370, Software Design), so I’d better figure it out.
  8. Bundles vs hurdles; discrete skills vs lifelong practicesI haven’t actually read Linda B. Nilson’s book Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time, but I learned about bundles and hurdles from finally reading Amy’s August 2020 blog post from August on rethinking assessment. In nutshell, students can achieve higher grades by learning more stuff (completing more “bundles”) or by demonstrating greater mastery  (“jumping higher hurdles”). In retrospect, it was my intention to use the “bundle” approach in CS/Math 220, and it worked fairly well with that course’s emphasis on discrete skills and knowledge (wordplay unintended!) I had something more like the “hurdles” approach in mind for CS 267 – where the skills are easy to learn and hard to master – and both my thinking and my implementation were much more mushy. I’m going to need to clarify my approach for this spring. Guess I’d better read the book! In the meantime, I’ll read Nilson’s essay in Inside Higher Ed.
  9. Change takes time. I have no doubt that many of the problems I had with Grading for Equity throughout the semester – and the problems in CS 267 were particularly pernicious – were due to not taking enough time to plan in the month of August. I fear that’s going to happen again for the spring semester. But at least now I know a lot more, so I’m starting from a higher place.

With nine challenges and only three benefits, you might think I’d count my experiment a failure. But I’m planning to press on. As noted above, I think all these challenges can be at least partly overcome. And when I do, I’ll achieve Feldman’s key benefits for students: accuracy, bias-resistance, and motivation. Those benefits clearly outweigh the challenges of doing something new.

I’d love your advice and further resources to address these challenges! Commiseration is always welcome, too.

(Note: Colleague’s emails are quoted with their permission.)

4 thoughts on “Lessons learned from my first semester of Grading for Equity

  1. Philip East

    Hi, Janet. Thank you for the nice words. (FYI, a colleague submitted a revision of the position paper–it was also rejected.) Your reaction is the same as we had. It wasn’t perfect the first time but it revolutionized our thinking and grading/teaching practice. You seem sold on grading for equity (GfE) and intent to keep working on it even though there were issues. Great!

    I think what you have done/are doing is a wonderful start. I suspect your challenges will decline as time goes by. Working with colleagues helps a lot. Keep in mind that I really like talking about teaching and learning. Let me know and we can email or zoom.

    I’m pretty sure the word will get around and students will get used to GfE. I tried to simplify grading as much as possible. Like you I tend to think in terms of competency rather than mastery which means I don’t need to think much about gradations of competence–students showed that they got it or didn’t get it. Occasionally I wasn’t sure so I asked students to come in and explain their thinking. I never had a student question my evaluation. I found that I could have high expectations for competency while doing this. My final grade was mostly based on the number of outcomes for which competency was demonstrated perhaps combined with an assessment of putting it all together. Also in an effort to keep it simple, we tended to prepare study guides for students. Study guides would be used to construct learning activities that would provide practice and to build assessments (often selecting assessment items from the study guide).

    I wonder a bit about your use of bundles and hurdles. That seems to provide an opportunity for students with better background knowledge or academic skills or time, etc. to get better grades and inequity can creep back in. The desired (graded) outcomes should be the same for all students. There is always more stuff that could be learned and students should be commended for learning it but grades should be based on demonstrated learning on one set of outcomes.

    Again, I applaud what you are doing. You’ve done better in your first attempt that I did in mine.

    Reply
    1. Janet Davis Post author

      Philip, thanks for your constructive (and encouraging) feedback! And thanks so much for your offer to talk. I will likely take you up on that, after I get a couple of other meetings firmed up.

      Reply
  2. Amy Csizmar Dalal

    Thanks for this post! I especially love your point about revision working both ways. I felt like I could experiment more, especially with exam questions, and that it was easier to be honest with students when those exam questions didn’t work out. Also, I was able to separate “this question didn’t work the way I intended but your answer demonstrated competency in the intended learning outcome so we’re good” from “this question didn’t work the way I intended but your answer shows deficits in your understanding of the learning outcome, so here’s how you should approach your revision” — and that was really freeing.

    I found it really helpful to continually beat the drum of “revision IS learning” throughout the term, so that it was crystal clear that I *expected* everyone would need to revise *something* over the course of the term. (And that proved to be true!) But I’m still on the fence about grading group projects — there is value in individual assessment, but I still find I’m loathe to completely throw out group assessment of team projects.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *