A recent comment brought up the traditional criticism of grading on curves -- what if 80% of a particular class did really well, and deserves As? what if 80% did badly, and deserves Fs? why not grade objectively rather than comparatively? -- so I thought I'd repost my paean to curves from 2002. (Note that this discussion is about curves in large classes, of about 30 students or more; for smaller classes, such as 12-student seminars, the curve is not apt, though of course there's some controversy about where the cutoff size should be.)
Lots of people really oppose curves. Shouldn't people be graded on their own merits, they reason, rather than based on how other students have done? After all, they ask me, don't you know the difference between an A exam and a C exam?
Well, yes, I do -- but I surely do not know the difference between an A- exam and a B+ exam. And this ties in to some of the reasons why grading on a curve is the lesser of evils:
Sometimes I draft a hard exam and sometimes an easy one. I often can't tell which is which, since they're all easy to me -- I know the material, after all! So something might look to me like a C exam not because this student is unusually bad, but because the exam was just harder than ones from previous years.
Even setting the previous factor aside, I've been in teaching for 12 years now -- but many professors are new, and don't even have the data points that I have. In some areas, such as legal writing, the typical teacher has even less experience. (Likewise, in undergraduate institutions, many classes are traditionally taught by relatively inexperienced teachers.) Where are they going to get the distinction between A-s and B+s?
Perhaps the curve is unfair to a class that consists of unusually strong students -- but the absence of a curve is unfair to a class that has an unusually harsh professor. And the variation in class strength, especially classes of 50-100 students (the size of nearly all my non-seminar classes)), is likely to be much less than variation in professor harshness.
The pressures for grade inflation are quite real, and flow from basic human nature: Most people don't like giving students low grades, especially once they've spent many hours with them. When I have small classes that can't be curved as easily (since there are so few data points that there's a higher chance that the class is unusually strong or weak), I feel this pressure myself, even if the class is still blind-graded. And of course if a professor is known for resisting this pressure, then fewer and fewer students will end up taking his class.
There are, I'm sure, many more advantages to the curve; and I think these advantages vastly outweigh the disadvantages. Like democracy, grading on a curve may be the worst possible system -- except for all the alternatives.
Related Posts (on one page):
If you institute a curve, there will be lots of "C" students competing in the job market with "A" students from other classes/schools.
The reality is that in a tough market for new hires, even at a top ten school, the bottom 70% of the class will have a very hard time getting a job that validates their law school loans.
UCLA is an excellent school, Prof. Volokh, but in a bad year, economically, how many of your students find it difficult to get their first placement? Are they all just picky? And what if they had the benefit of higher grades, like some of their competitors in the job market?
The debate about curves is an interesting abstraction. How do you suppose it meshes with the hard realities of the job market?
Finish higher in the curve and it doesn't matter. This argument seems to me to be that because unqualified law students are getting jobs, Prof. Volokh should inflate his grades so his less qualified students have shots at jobs for which they're not qualified.
Grades don't get you jobs. They might get you interviews with some firms, but you still have to make those interviews to get hired. I finished very high in my class and 3 people (roughly 25%) of those who finished above me are currently unemployed (pre-bar summer). Hell, a lot of the time you have to get plain lucky. How does that mesh with your vision of the job market?
But in any event, if we gave (say) 70% As, why do you think that this would on balance help our students compete with top students from other schools? Why wouldn't it hurt our students on aggregate, and especially our best students, because employers who learn of the system will start being skeptical about any UCLA graduate? And if we just let each teacher do what he pleases, why would this be fair to students who happened to get the tough teachers rather than the softies?
I do think that we shouldn't handicap our students relative to students from other schools, which is why I supported shifting our curve to match those at most other Top 20 schools. But throwing out the curve altogether, or making it much more A-heavy than that at comparable schools, seems to me to be a mistake, not just as an "abstraction" but also as a matter of "the hard realities of the job market."
To me, this says much more about the negatives of letter grades (and especially plusses and minuses) than the positives of grading on a curve. If there truly isn't a discernable difference between two exams, why should we develop fancy statistical models that we have to force the exams into in order to distinguish them?
I've never taught, so perhaps I am suffused with naievete, but it seems to me that if you have categories whose distinctions are so fine that the professor cannot easily place results into the categories, the problem is with the categories, not the distribution of results.
Employers can certainly take a school's curve into account if they want to, adjusting for the fact that school A has a higher curve than school B. But there is no reasonable way for an employer to incorporate the effects of section-assignment.
Although this may seem insignificant, remember that for many law students, the summer associate position is earned when only grades from the first year are in. For many students, those grades are for 8-10 classes, of which only perhaps 2 or 3 are outside the section. For many students, that summer associate position determines the student's employment from law school.
It seems to me that having a particularly intelligent section is likely to depress a student's grades significantly, while having a particularly dull section is likely to inflate them.
Preferred Customer wrote: "To me, this says much more about the negatives of letter grades (and especially plusses and minuses) than the positives of grading on a curve. If there truly isn't a discernable difference between two exams, why should we develop fancy statistical models that we have to force the exams into in order to distinguish them?"
There surely are discernable differences between exams. But the differences don't tell me where exactly to draw the A-/B+ line. The raw scores are usually on a continuum, with no vast gaps.
Any line, whether A-/B+ or A/B in a world with no -s or +s, will be in some sense arbitrary. That's why a rule that the line must be drawn at, say, the 75th percentile, is valuable.
There is a better way to "curve" (actually, assign) grades. Do not apply any statistical manipulation to what you see until you first "plot" (subjectively) what is clearly your As and clearly your Fs (or lowest). Let us call this your "qualitative assessment," and let us refer to each extreme case, the A and F, as your target numbers.
When fixing your targets, you have to fix them at the "breaking points." That is, if one of your Fs is a zero, it can't be used. You are looking for the 59% F+ and the 90% A-. (Picking out your A target seems a lot like picking fruit at the grocery store. Sometimes you'll have to buy a B, but you know an A when you get one).
Once you have these targets, the fitting of the numbers in between to "count" properly from target F to target A is a simple mathematical exercise. It may mean that the scale goes above 100, but if that student's exam is so far above your initial A target, that is a problem of what to do with outliers, not the method of computation.
Once you get your targets, the counting is simple. Just figure out what base level of points each exam must have so that each additional point obtained counts to the desired values. I think this is a linear equation, but I am not sure. I do this on my spreadsheet. What is nice is that, after you make your cuts, you can change them a little if they conflict with your assessment about some of the fruits in between. (This perfects the cuts, because the original assessment is always going to be a bit of a hunch).
If you need more info, please feel free to email.
Regards,
Sean
Grading on a curve is cruel, but fair.
I think the fact that UCLA feels that it must inflate along with the rest of the top 20 shows a deep insecurity about its ability to place its graduates based on the quality of the law school-- you need to give everyone A's and A-minuses to make sure they get jobs. (Of course, I say this having gone to law school at USC.)
That's how my school did it as well. But LSAT and GPA are only guesses as to how a student will do in law school. If my LSAT score was a prediction of what my grades would be like then I would have graduated a lot higher in my class. ;-)
Of course you might worry about social effects (some cohorts encourage each other to study) but if these effects exist the situation is already unfair (the people in the cohorts that don't encourage them are getting screwed) and you are just partially correcting the unfairness.
But maybe I just don't understand this system and am missing some key point.
I think what the people in this thread who haven't taught before are missing is the sheer subjectivity of grading. Even if you grade papers blind (no names) and double check yourself it is very hard to be sure external factors aren't influencing your deciscion. Did you decide to give more students As because the class was really stronger this year or because you were in a good mood the day you decided on the grading scale? There are all sorts of clever studies that show people are very easily influenced by various factors (the product they are looking at is on the right or left) but are very good at coming up with supposed objective justifications for these differences. It is hard enough comparing students in the same class without letting things like handwritting/font influence your judgement much less comparing them to some 'objective' standard of A and B in your mind. Whatever flaws the curving process may have for large introductory type classes it is less than the alternatives.
On the other hand I do have problems with curving even for classes of 30+ people which are upper division electives. I don't know how this works at law school but in my experience the makeup of these classes can change greatly from year to year depending on what rumors are floating around in the school about the professor and the class (really good/hard or easy).
I am still not convinced of the need to assign arbitrary values beyond those that are immediately discernable to the professor. While five categories (A,B,C,D and F, where D and F aren't the same thing) are surely easier to tell apart than 10 (A+,A,A-,B+,B,B-,C+,C,C-), it sounds from your post that even five categories might be too many to have real significance to the grader.
Reading a set of papers (though, again, I've never done so in the academic world), it's usually pretty easy for me to break them into three categories: those that are bad, those that are about what I expected, and those that really impress me. Beyond that, it's often hard to assign a ranking. Is paper A that did x better than paper B, even though it didn't do y as well?
We count on professors to teach material, and I think we should rely on their judgment to grade students' use of the material. But if a professor cannot tell the difference between two papers, and has to use statistics to draw a line between A and B, or A- and B+, why should we pretend that there IS a difference between those two papers?
FN: Most of what I'm talking about here applies to essay type examinations, rather than exams that consist of a series of questions that have obviously "correct" or "incorrect" answers)
Most law school exams are essays and there are a lot fewer black and white correct answers in the law than there are in medicine. Law professors usuall design their questions to fall into the gray area between two rules/statutes/cases. Even when the professor designs a question that has a rule/case/statute directly on point, many legal tests require you to balance multiple different factors which means that there is rarely one absolutely correct answer. Also, a passing grade in law school is ridiculously low. At least it was in my school. On a 4.0 scale, where 2.8 was a B- and the median, a .6 and below was a failing grade. I only know of one person who failed a class and that was because he wrote a story about himself in his blue book instead of attempting to answer the question.
"Hmm... That course Volokh's teaching this term sounds really intersting. I think I'll sign up for it, how about you?"
"Yeah, now that you point it out... Let's see... I think I'll sign up too."
On the flip side, courses that are considered easy might collect a number of the lesser students.
What I believe Eugene is saying is that while there are obvious differences between all 10 people in the class (they each spotted a differing number of issues), objectively, which of those deserve an A- and which deserve a B+? Is 6 issues good enough for a B+? Or 7? Or 5? At some point you make an arbitrary line. Or, rather, the curve can force you into making an arbitrary line. In my first year, when we had the 20/60/20 curve, person 10 would get an A, person 9 would get an A-, persons 8 and 7 would get a B+, 6 and 5 a B, 4 and 3 a B-, 2 a C+, and 1 a C. Or something like that.
The point is, while there are objective differences between all tests, there is no way to objectively know which grade should go with which raw score. Hence the curve.
After that, they let me limit my class size to 30 (allowing me to grade however I pleased) and draw a lottery to get in....
True enough, they are guesses in a sense, but the reason that these two are used is that the correlation is greater than with any other predictor (that I know of, though one wonders about "IQ"), and despite the apparently high validity, there will be dots on the map that don't look right, (you apparently being one of them :).
Imperfections aside, the LSAT has a predictive validity, last time I checked, of about .6, which is to say, pretty damn good.
When I finally got the job I now have and love, I got the interview on the strength of the resume, and the job on the strength of the interview. I think that the interview plays a bigger part in getting the summer job than you do, though obviously the summer job performance is what gets you the permanent job.
How in the world can you curve a paper class in undergad? That's insane (of course, I often wondered how you got an A, B, C or whatever on those too)!!!
I had a prof who led a lab class that had groups. He started out by randomly assigning the groups. He then took the best students throughout the semester and put them together, and took the worst students and put them together. Near the end he told the bottom couple of groups that there was one last chance to escape with a "D." The top students ended up together, and got A's.
Could something like that be done with the sections? (add or subtract 1/2 of a GP from the students in the high or low section?)
Pretty damn likely, actually, especially at the graduate level. Careerist middlegrade students just looking to get their Master's and get out will avoid difficult or 'weird' classes like the plague, while dedicated students will probably stick it out.
These types of class are actually very easy to spot, though - they usually halve in size before the drop point. Some of the more dastardly professors would take advantage of this by assigning absurd homework the first day or two - like reading 100+ pages of some dense 50 year old Sanskrit textbook a day and translating a page that even the TA would have trouble with in the Introduction course - and then being all "Surprise! I was just kidding! ell-oh-ell" to the haggard few that were left.
I hated those professors.
Do you really believe that "plenty" of students in each class year at UCLA should be flunked out due to an arbitrary curve? If you flunk out of a top law school, you cannot get a job as a lawyer, even if the only reason you flunked out was that you happened to be the "least of the best." Conversely, if the same student went to a lower ranked school, he would likely graduate and get a job. ... Something about that just seems wrong. To give a student an F (or otherwise "unsatisfactory" grade) at a top school I think the student should truly have a failing knowledge of the subject in the objective sense. It's one thing to compare students and say "this student is better than that student." But, to say "this student is worse than the other students AND doesn't even deserve to graduate and practice law," that's something else entirely. If you think about it, with a forced curve, the law school would then intentionally be admitting students, taking their $30,000 in tuition, and then kicking them out just because they weren't as good as the rest of the class.
I certainly don't think I will be the smartest guy in the UVA Class of '09. If I'm average or even below average, at least I am confident I could still get a job. There's no way I'd go to UVA or any other law school if I had a 10% chance of spending $30k+ in tuition to be booted out because I was part of that unfortunate "not as good as the rest of the class" group. And what would that say about the admission committee if some fixed percentage of each class wasn't actually good enough to go to that school?
Mike, I think what he was saying is to shuffle the sections so that you're not always compared agaist the same students when grading. My first year one section had 6 of the top 12 people in it. Maybe shifting sections around would alleviate that, but I don't think it's that big a problem. 6/12 might mean easier grading by that section's professors as much as anything, which is the real reason for the mandatory curve. If Professors A, B, &C each teach contracts I, and one gives out 12 As one gives out 6 and one 3, it's obvious that the folks having the third professor are going to get screwed on class rank and GPA. Having the mandatory curve prevents that. Not that I'm bitter some other section had 12 A's in Contracts I. Nope, not me.
I've enjoyed watching my undergrad team regularly beat both UVA and VT in football. :)
Linear Passive Circuits is a great weeder class for any engineering specialty not because it is much like what Civil engineers, chemical engineers, or industrial engineers do but because it requires engineer style thinking but it's pretty easy overall. Often times half the class of qualified ambitious pre-engineers will flunk and switch to some other discipline. It's the only way to really know who should continue.
Usually the folks who can adjust themselves to the weeder course find the rest of the curriculum relatively easy. Whether this is because weeder courses are hard and set a high bar or because the engineering "way of thinking" has been inculcated is a matter or ongoing dispute.
But what about this year's students? Which class gets to be the sacrificial lamb for the benefit of the "long term" UCLA students?
Incidentially, I know people who graduated college with a 'C' average, who are now millionaires, and I know who people graduated college with an 'A' average, who are working at Starbucks or McDonald's or Barnes &Noble. All of which goes to prove a point a writer I know makes well: It isn't about where you went to school, it is about what you do with what you know.
Thank you.
More realistically, what would happen is that students would simply avoid the difficult graders and take the classes taught be easy graders. That wouldn't be a problem in first year where students don't get a choice, but where would it leave the hard graders when no one signs up for their upperclass courses?
What's the curve?
The Law School maintains its transcripts on a 0-100 scale.
The University maintains its transcripts on a ABC A=4pts scale. When the Law School sends grades to the University 92-100=A, 82-91=B etc.
Law School professors rarely gave numeric grades of less than 82, so almost all students carried at least a B average.
The Law School maintains exact class rankings based on the numeric grades.
Professors are constrained in handing out grades by the records of the students who are taking their classes. They cannot bid for popularity by handing out easy grades. The Deans office looks at the records of the students coming in to their classes and tells them in effect -- you may hand out no more than 1 100 2 95s 3 92s 6 90s etc. The result is that there are no "guts" taken for easy A's and difficult courses are not shunned. Good students have their class ranks and honors and even mediocre students graduate with B averages.
On the Law School Grading Scale the system was designed to force profs not to disrupt the overall class mean, which was about 90 with a standard deviation of about 3. This produced a B average on the University's ABC scale. Students who flunked out last year were removed from the data base.
*a man so dull that his students called him "The Chiller," Nonetheless he deserves a footnote in the history of information science, because he devised the first versions of the database that is now Lexis-Nexis.
Oh, but that's not that far off. In 20 years Moore's Law gives them our raw processing power -- and they are far better organized to utilize it.
Brian mentions EE classes. We had a somewhat novel problem with curves there. I took EE classes at the University of Colorado Denver campus. The profs often taught the same classes both in Boulder and Denver at the same time, and would give almost identical tests (identical questions, except values would be changed - easy to do in many EE classes). The problem was that the Denver classes would tend to score 10 points or so higher, on average, than the Boulder classes, because the Denver students were going to night school, and the Boulder students were drunk at night. This caused the profs some problems. They couldn't give all the A's to Denver students, because, after all, Boulder was the research school, but they couldn't curve the classes totally independantly because the Denver students were the better students (mostly, I think more focussed).
There solutions were ad hoc, which was very troubling for EE profs. Some took the easy way out, and curved each school independently (The opposite solution was politically untenable, since the chair was always in Boulder). But the more ethical profs seemed to nudge the Boulder curves down and the Denver curves up some, so that the overall curve was good, but there were still enough A's in Boulder.
On varying instructor harshness - they're going to face variations among bosses, judges, even significant others. No time like the present to learn you can't win all the time.
Of course, as a lazy underachiever, if they didn't curve in my high school, I'd have been in trouble. Other people's poor exam performance made up for my hatred of homework,
If more than 50% of students got an answer wrong, it was assuemed it was a bad question. But if the marks were too high it was assumed the test was not tough enough.
In the 1950's, they flunked out the bottom ten percent automatically...but by the time I went in the 1960's, they merely tossed out those who were well below the bell curve.
But, of course, in Medical school, part of our mark was based on practical experience and oral testing...
What I see in my elite law school classes is that different professors have different ways to tackle to the very hard problem of differentiating among highly intelligent students who know the course material and how to apply it. All of the methods the professor might chose will in some way differentiate student by sheer intellect and then also by something else. There will always be a few students that are just a step above everyone else, even at elite schools. Any method chosen will have some variability in its measurement and want to test for a basic set of knowledge.
What seems to be normally done is to use time pressure to force a this differentiation. This time pressure tests who can thinking quickly under pressure and who writes well under pressure. More time would of course allow students more time to craft better answers, but would make it harder to differentiate between students. A multiple choice exam (yes I had one for Taxation I, over 200 questions in 180 minutes) differentiates on pattern recognition while under pressure. The take home exams become exams that differentiate solely among writing skills.
Perhaps the emphasis on writing skill is misplaced, or perhaps since lawyers write so much for a living, it is a good test of necessary skill, but writing skill seems to be the main attribute that differentiates grades among the all bright and well prepared students.
Having gone through more than one oral exam where the format was a several hour technical oral interview asked by a panel of three experts, I can say that there are others ways to more accurately assess knowledge and ability to reason than a three or four hour written law school essay. However obviously the cost both to the institution and the student is much higher with different methods.
The law school format seems to be a very cost effective way to force a differentiation but perhaps students need to understand what is really being used to tell the C exam from A exam.