Enforced randomization isn't going to fix the problem, it just evenly distributes the problem.
Based on these results, it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade. That's the problem the needs to be fixed, not the order they get graded in. Enforced randomization is simply a short term alleviation so no student(s) get disproportionately affected by this phenomenon.
> it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers
Or maybe they are getting better / more picky.
I know in code reviews I often pass a few and then notice something that I realize was also wrong in previous reviews I allowed, but later reviews that day (week?) will not allow that.
I've participated in day-long and multi-day interview events for job candidates, and I see the same effect. At the beginning you don't have a frame of reference and you're more likely to question your own decision or give someone the benefit of the doubt, but by the end you're far more systematic, plus a little bit numb to the effect your decision is having.
For grading, you could probably just add a mediating factor and throw in test cases that calibrate the factor and then you curve everyone on that factor.
It'd seemingly be more work but would result in averages that are more reasonable to the changes in stress.
Additionally, universities (and, by extension departments) want grades to approximately follow a normal distribution (and yes, you in the back, their actions show they do actually want that, even if they say otherwise).
When you start grading a problem you have some idea what a "good" solution looks like, what an "ok" solution looks like, and same for "bad" solutions... If you award points based on that, the result will be a normal-ish distribution. But your idea of a good/ok/bad solution evolves as you see more papers.
There's two reasons for that:
First, you can't (ahead of time) imagine all the ways that students will invent to fuck up a problem set, and find edge cases in your grading rubric that result in unfairly-high or -low scores. As you gain experience teaching, you will anticipate more of the ways, but you will never anticipate every way.
Second, the TA/grader wants to be able to stack-rank the papers and have the scores be monotonic. The grader wants this because non-monotonic scoring triggers far more complaining than harsh scoring or picky scoring. When you come across papers that are worse than ones you've already recently graded, you assign even lower scores.
This results in a ratcheting effect with more extreme scores as you get closer to the bottom of the pile. But, since the mean score is usually a B/B-/C+ (~75-85), and since scores are usually limited to the range 0-100, this means that papers closer to the bottom will receive statistically lower scores.
Now, you could go back a re-grade ones you've already done, but:
1. The university is officially only paying you for 20hrs/week (and requires a signed end-of-semester statement attesting to the same).
2. The assigned workload of teaching and grading doesn't permit a two-pass grading scheme while keeping within 20 hours.
3. If you complain to the graduate ombudsman about the workload needing more than 20 hours, you won't have funding next semester (so you have a prisoner's dilemma among TAs who might want to grade more fairly).
4. If you're grading (say) a final exam for a frosh/soph class, you're probably in a room with 4-8 other graders late into the night. One effective way to make your coworkers hate you is to be that guy who always finishes grading his stack last, when everyone is worried about catching the last train/bus.
Basically, all the incentives are aligned to make this happen.
Essentially, unless it's an old exam where the universe of bad answers is already known, you need two passes - a discovery pass followed by the grading pass.
In my case, I have to make a conscious effort to remain consistently (in)tolerant of lazy writing. It’s hard to keep on reading between the lines and giving the benefit of the doubt.
In my experience, it's not tired/lazy/inattentive, but resignation. You normally have some expectation what students will be able to solve. Typically, these expectations are set too high. That's very common, not only for me, but for pretty much anyone I know. So over the time of grading, one adjusts down the expectations and gives partial credit earlier, for example.
I was a grader once. I guarantee if someone gives a good answer they'll get full marks even near the bottom of the stack. For BS answers I'll admit I got less generous as the hours went on.
No one's getting hurt by this system if it's randomized. It's a matter of graders giving out partial credit for wrong answers which is discretionary. Rarely students are granted a small mercy. Seems OK.
I was one of many TAs for a large math class in college (pre-calc - think high school math for college students). For uniformity, the prof had the partial credit down to a science - specifying points for getting certain aspects of the problem. For the finals, a few TAs would be assigned to a given page, for uniformity.
The fascinating thing was that the distribution of grades was about the same every year.
And I had a math prof for analysis who would give negative points for BS answers. You could say “I need X but don’t know how to prove it” in the middle of a proof, but if you made up something that was incorrect, you’d get negative points.
>For BS answers I'll admit I got less generous as the hours went on.
What do you think is the cause of this? Do you become more cynical (and less generous) because you’ve seen so many BS answers previously? Is it just that getting fatigued makes you less generous?
When I was a TA in grad school, I noticed the same. Early on I thought some BS answers were at least kind of funny, and I gave them the benefit of the doubt, maybe giving more attention to the parts that were correct. After I saw similar answers later on, the novelty wore off and I was probably less amused, so the inclination to be lenient disappeared. Sometimes I went back to previous decisions if I remembered them, to be fair, but I don't think I always remembered since the volume could be high (grading 80 exams in a row is TEDIOUS).
Random order would still mean a few students in the class get unlucky and near the end the majority of the time. Although over the course of all classes it would tend to even out somewhat.
"randomization" is not the important part here. "evenly distributing" is. It is absolutely possible to reorder the sequence fairly such that your scenario doesn't occur. It could even to a human observer look randomized if you want. In a trivial example case where the effect were linear you could just switch the order back and forth, and on average every student would receive the same middle-of-group impact.
For me I grade tests as follows. The stack is created as students turn in the test. I grade the first page in that order. The stack reverses for the second page. So on and so forth. I teach college math. I just cant imagine a system of grading done in alphabetical order.
I also came here to say this. My only guess is that the alphabetization (by the "learning management system") to make filling the grades into a table "easier" for the computer or for the person handing out the results? Why is it "easier" if the system doesn't have to order them at all, or it could do so by student number (same issue as alphabetical order) or something random, which is the other (non default) option for the "learning management system".
I feel like only the most obsessive compulsive humans would have this issue (without computer "help"), as the last thing I wanted to do as a TA was to add another step of ordering all the papers before grading them. I also always reviewed the first few papers I graded after grading the rest to make sure I was being fair, because it was obvious to me that until I saw a representative distribution of answers I couldn't do fair grading/marking.
It's a 0.6 gap from top to bottom out of a score of 100. Plus or minus a third of a percent from average. Pretty small effect. But it would add up (or, well, persist - it wouldn't get bigger) if it happens to you for every assignment for every class and that sucks.
If there's more than one assignment you can basically erase it by randomizing each separately.
If you really care beyond that then randomize for one assignment, flip it for the next, then randomize again for the next etc.
> graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade.
I will admit to this. Initially, my patience and tolerance for errors is significantly higher than towards the end of the grading. By the second hour grading, I am not only mentally exhausted my tolerance is significantly lower.
I try to prevent this by creating very explicit grading rubric and I stick to it as much as possible.
Clear rubrics are the thing where possible. They aren't everywhere though. I've been on conference committees and so many different factors come into play--including how late in the day it is. But, in that case, a bunch of people are rating and commenting and there's no strict order so it probably evens out to a reasonable degree.
Even distribution would fix the problem. If grading has a subjective component, there will always be deviations from the "correct" grade. If those patterns are randomly distributed over all students, their grade averages will be comparable again.
Based on these results, it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade. That's the problem the needs to be fixed, not the order they get graded in. Enforced randomization is simply a short term alleviation so no student(s) get disproportionately affected by this phenomenon.