Enforced randomization isn't going to fix the problem, it just evenly distribute...

bluGill · on April 20, 2024

> it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers

Or maybe they are getting better / more picky.

I know in code reviews I often pass a few and then notice something that I realize was also wrong in previous reviews I allowed, but later reviews that day (week?) will not allow that.

13of40 · on April 20, 2024

I've participated in day-long and multi-day interview events for job candidates, and I see the same effect. At the beginning you don't have a frame of reference and you're more likely to question your own decision or give someone the benefit of the doubt, but by the end you're far more systematic, plus a little bit numb to the effect your decision is having.

throwaway35777 · on April 20, 2024

> by the end you're far more systematic, plus a little bit numb to the effect your decision is having

Maybe decision fatigue is supposed to bias humans toward the optimal solution for the fiancee problem [1].

[1] https://en.m.wikipedia.org/wiki/Secretary_problem

cyanydeez · on April 20, 2024

For grading, you could probably just add a mediating factor and throw in test cases that calibrate the factor and then you curve everyone on that factor.

It'd seemingly be more work but would result in averages that are more reasonable to the changes in stress.

labcomputer · on April 20, 2024

Yes, and:

Additionally, universities (and, by extension departments) want grades to approximately follow a normal distribution (and yes, you in the back, their actions show they do actually want that, even if they say otherwise).

When you start grading a problem you have some idea what a "good" solution looks like, what an "ok" solution looks like, and same for "bad" solutions... If you award points based on that, the result will be a normal-ish distribution. But your idea of a good/ok/bad solution evolves as you see more papers.

There's two reasons for that:

First, you can't (ahead of time) imagine all the ways that students will invent to fuck up a problem set, and find edge cases in your grading rubric that result in unfairly-high or -low scores. As you gain experience teaching, you will anticipate more of the ways, but you will never anticipate every way.

Second, the TA/grader wants to be able to stack-rank the papers and have the scores be monotonic. The grader wants this because non-monotonic scoring triggers far more complaining than harsh scoring or picky scoring. When you come across papers that are worse than ones you've already recently graded, you assign even lower scores.

This results in a ratcheting effect with more extreme scores as you get closer to the bottom of the pile. But, since the mean score is usually a B/B-/C+ (~75-85), and since scores are usually limited to the range 0-100, this means that papers closer to the bottom will receive statistically lower scores.

Now, you could go back a re-grade ones you've already done, but:

1. The university is officially only paying you for 20hrs/week (and requires a signed end-of-semester statement attesting to the same).

2. The assigned workload of teaching and grading doesn't permit a two-pass grading scheme while keeping within 20 hours.

3. If you complain to the graduate ombudsman about the workload needing more than 20 hours, you won't have funding next semester (so you have a prisoner's dilemma among TAs who might want to grade more fairly).

4. If you're grading (say) a final exam for a frosh/soph class, you're probably in a room with 4-8 other graders late into the night. One effective way to make your coworkers hate you is to be that guy who always finishes grading his stack last, when everyone is worried about catching the last train/bus.

Basically, all the incentives are aligned to make this happen.

hilux · on April 21, 2024

That's thought-provoking - thank you.

Essentially, unless it's an old exam where the universe of bad answers is already known, you need two passes - a discovery pass followed by the grading pass.

bigfudge · on April 20, 2024

In my case, I have to make a conscious effort to remain consistently (in)tolerant of lazy writing. It’s hard to keep on reading between the lines and giving the benefit of the doubt.

rjzzleep · on April 20, 2024

I had the same conclusion. You learn things as you go, including things you don't like.

davrosthedalek · on April 20, 2024

In my experience, it's not tired/lazy/inattentive, but resignation. You normally have some expectation what students will be able to solve. Typically, these expectations are set too high. That's very common, not only for me, but for pretty much anyone I know. So over the time of grading, one adjusts down the expectations and gives partial credit earlier, for example.

throwaway35777 · on April 20, 2024

I was a grader once. I guarantee if someone gives a good answer they'll get full marks even near the bottom of the stack. For BS answers I'll admit I got less generous as the hours went on.

No one's getting hurt by this system if it's randomized. It's a matter of graders giving out partial credit for wrong answers which is discretionary. Rarely students are granted a small mercy. Seems OK.

dunham · on April 20, 2024

I was one of many TAs for a large math class in college (pre-calc - think high school math for college students). For uniformity, the prof had the partial credit down to a science - specifying points for getting certain aspects of the problem. For the finals, a few TAs would be assigned to a given page, for uniformity.

The fascinating thing was that the distribution of grades was about the same every year.

And I had a math prof for analysis who would give negative points for BS answers. You could say “I need X but don’t know how to prove it” in the middle of a proof, but if you made up something that was incorrect, you’d get negative points.

hilux · on April 21, 2024

Oh, that brings back memories! "For every epsilon, there is a delta ..."

bumby · on April 20, 2024

>For BS answers I'll admit I got less generous as the hours went on.

What do you think is the cause of this? Do you become more cynical (and less generous) because you’ve seen so many BS answers previously? Is it just that getting fatigued makes you less generous?

ihaveajob · on April 20, 2024

When I was a TA in grad school, I noticed the same. Early on I thought some BS answers were at least kind of funny, and I gave them the benefit of the doubt, maybe giving more attention to the parts that were correct. After I saw similar answers later on, the novelty wore off and I was probably less amused, so the inclination to be lenient disappeared. Sometimes I went back to previous decisions if I remembered them, to be fair, but I don't think I always remembered since the volume could be high (grading 80 exams in a row is TEDIOUS).

BugsJustFindMe · on April 20, 2024

> Enforced randomization isn't going to fix the problem, it just evenly distributes the problem.

Evenly distributing the problem does fix the problem. Proportionality is what matters. Grading being arbitrary is fine if everyone is graded equally.

zeroonetwothree · on April 20, 2024

Random order would still mean a few students in the class get unlucky and near the end the majority of the time. Although over the course of all classes it would tend to even out somewhat.

It’s certainly better than fixed order.

BugsJustFindMe · on April 20, 2024

"randomization" is not the important part here. "evenly distributing" is. It is absolutely possible to reorder the sequence fairly such that your scenario doesn't occur. It could even to a human observer look randomized if you want. In a trivial example case where the effect were linear you could just switch the order back and forth, and on average every student would receive the same middle-of-group impact.

whiterknight · on April 20, 2024

The mistake is assuming grades are an objective measurement, and not gamification to try to help you learn.

BugsJustFindMe · on April 20, 2024

It's a common mistake. So common, in fact, that it has real practical impact on students at the edge who might not otherwise have failed or passed.

skhunted · on April 20, 2024

For me I grade tests as follows. The stack is created as students turn in the test. I grade the first page in that order. The stack reverses for the second page. So on and so forth. I teach college math. I just cant imagine a system of grading done in alphabetical order.

falseprofit · on April 20, 2024

Scanning and grading on a computer can alphabetize them.

skhunted · on April 20, 2024

That makes sense. I haven't had people upload assignments for a long time. I'd forgotten that this was a thing.

kurthr · on April 20, 2024

I also came here to say this. My only guess is that the alphabetization (by the "learning management system") to make filling the grades into a table "easier" for the computer or for the person handing out the results? Why is it "easier" if the system doesn't have to order them at all, or it could do so by student number (same issue as alphabetical order) or something random, which is the other (non default) option for the "learning management system".

I feel like only the most obsessive compulsive humans would have this issue (without computer "help"), as the last thing I wanted to do as a TA was to add another step of ordering all the papers before grading them. I also always reviewed the first few papers I graded after grading the rest to make sure I was being fair, because it was obvious to me that until I saw a representative distribution of answers I couldn't do fair grading/marking.

hilux · on April 20, 2024

In the real world, universities are never going to fix the problem of overworked and underpaid grad students getting tired.

furyofantares · on April 20, 2024

It's a 0.6 gap from top to bottom out of a score of 100. Plus or minus a third of a percent from average. Pretty small effect. But it would add up (or, well, persist - it wouldn't get bigger) if it happens to you for every assignment for every class and that sucks.

If there's more than one assignment you can basically erase it by randomizing each separately.

If you really care beyond that then randomize for one assignment, flip it for the next, then randomize again for the next etc.

WaitWaitWha · on April 20, 2024

> graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade.

I will admit to this. Initially, my patience and tolerance for errors is significantly higher than towards the end of the grading. By the second hour grading, I am not only mentally exhausted my tolerance is significantly lower.

I try to prevent this by creating very explicit grading rubric and I stick to it as much as possible.

ghaff · on April 20, 2024

Clear rubrics are the thing where possible. They aren't everywhere though. I've been on conference committees and so many different factors come into play--including how late in the day it is. But, in that case, a bunch of people are rating and commenting and there's no strict order so it probably evens out to a reasonable degree.

bumby · on April 20, 2024

As the number of assignments grows, wouldn’t randomization help converge on the more accurate grades (in aggregate)?

falseprofit · on April 20, 2024

It would help, but with only a couple dozen courses and most determined by a couple exams it’s not quite a large number.

andix · on April 20, 2024

Even distribution would fix the problem. If grading has a subjective component, there will always be deviations from the "correct" grade. If those patterns are randomly distributed over all students, their grade averages will be comparable again.

cyanydeez · on April 20, 2024

Unfortunately, it's gonna be AI to the "rescue" and the problem is obfuscated.