February 3, 2010 « Grade Inflation: Eliminate It and Grade Disputes »
Much has been said about the causes, consequences of, and academic cures for grade inflation. The purpose of this Executive Briefing is to outline an easily adopted approach that will reduce grade inflation to a level comparable to other statistical noise in the grading system.
The plan outlined below is easily explained and justified, does not trigger issues of academic freedom (among reasonable people), and can be implemented immediately. If implemented, the plan will produce tangible results in the first grading term and will continue to produce incremental improvement for several terms at which time grade inflation will no longer be an issue. Equally important, implementing this plan with the recommended grade dispute policy will virtually eliminate grading disputes.
This Briefing will not revisit causes, consequences, and academic cures underlying grade inflation. It takes a pragmatic, behavioral approach. Examples used in this plan assume that terms are 10 weeks long. You will need to adapt the examples to your institution.
[NB: To keep this Executive Briefing within set limits on length and degree of technicality, I have omitted supporting arguments and facts. Please send me an email if you have questions about any element of this plan. Most of the generalizations rest on 20 years of empirical data and organizational experience in adult-centered programming.]
Apply Frequent Assessments of Similar Levels of Importance
Infrequent and high stakes assessments are two of the most significant contributors to grade inflation and to grade disputes. An important set of measures should be taken at least each week and each set of measures should be of roughly similar levels of importance.
Embed a Variety of Measures in Each Set
Recalling Goethe’s prescription, “One ought, every day at least, to hear a little song, read a good poem, see a fine picture, and if it were possible, to speak a few reasonable words,” each set of measures should reflect what the measurement scientists call “multi-trait/multi-method assessment.” In different ways, each set of measures should assess knowledge and behavioral proficiency, content and form, along with your university’s academic strands or embedded core skills (whatever the rhetoric, you don’t really have such strands if you don’t measure and manage their development across the curriculum).
Employ Simple Metrics
In most settings, we can demonstrate the statistical identity of three, five, and ten point scales. For the frequent, small, and varied assessments recommended here, I also recommend the following three-point scale:
- 1 = Exceptionally Inferior
- 3 = Exceptionally Superior
and (aside from ‘0’ for not showing up) - 2 = Everything Else
The “2” rating is the default baseline from which a true exception must occur. Critics may suggest that this scale does not recognize the many fine gradations important to their discipline. Statisticians will counter that these gradations are not in evidence as a matter of fact. The statisticians are correct. Further distinctions may make us feel better but they are statistically chimerical if the overall assessment system is well-designed. More important, consistent use of the three-point scale over time, with the above definitions for anchors (criteria for inclusion/exclusion), will result in more valid assessments.
An example will help. Using the three-point scale defined above throughout, assume that each week we assess six dimensions of knowledge, three dimensions of communications and presentation skills, two dimensions of group process skills, and two dimensions of critical thinking skills. Here is what the distribution will look like at the end of the term.
| Subject Matter Points: | 180 |
| Communications Points: | 090 |
| Group Process Points: | 060 |
| Critical Thinking Points: | 060 |
| Total Points: | 390 |
| Points per Workshop: | 039 |
When applied in this way, the resultant point distribution will approximate a normal distribution. One important element of latitude, left to the instructor, is where to set the point value breaks with respect to letter grades awarded.
Note that earning a rating of “2” in every metric would result in a total point value of 195 for the course. Accordingly, the instructor needs to give some thought to what this rating represents in this course; e.g., a “C” might be appropriate in an undergraduate course where a “B” might be appropriate in certain graduate courses.
The above point allocations and ratios between content and form are only examples. Each instructor, team, or department will want to determine their specific metrics under this general methodological architecture.
Measure and Report Variance
Some institutions have attempted to deal with grade inflation by measuring and reporting GPA statistics. The most common approach has been to report mean GPA values. This is a sound approach, except that it reports the wrong statistic. The problem with GPA as a metric for a behavioral target is technical in nature. Mean GPA is subject to a variety of intentional and systematic distortions that are eliminated by reporting another distribution statistic.
The correct statistic is variance (the statistical definition) reported by each level of analysis. A few instructors may argue against this. We have heard the arguments expressed in countless ways by hundreds if not thousands of instructors but they do not offset the benefits of reporting variance. Your elevator speech:
If we take independent and valid measures of the knowledge or proficiency of students in any course in this institution, on any relevant attribute, including their knowledge of the subject matter in their course or the immediately preceding course, the distribution of scores produced by those measures will describe a more-or-less a normal distribution. It follows, therefore, that the distribution statistic (variance being the single most robust indicator) derived from each course should, on balance, reflect that statistical reality. Yes, we recognize exceptions but, in the aggregate, such exceptions should cancel each other out. If they do not cancel each other out over time, we should suspect a threat to scientific validity such as the “Lake Woebegone effect” or some other effect.
The goal is to achieve high variance.
Zero variance represents instructional behavior in which all grades awarded have the same letter value, whether A, B, or C. High variance reflects distinctions among levels of performance in the classroom, which is a sine qua non of good instruction.
Notice that the mean GPA might be 2.8 for an instructor who awarded all 'B' grades except for two 'F' grades for individuals who dropped improperly. This mean GPA looks good until you examine the actual distribution. Reporting the variance (near zero in this case) would reflect the behavior that needs to be managed. High variance is achieved by making performance distinctions. Technically, it is possible for an instructor to award a normal distribution of grades that was invented; i.e., not based on making valid performance distinctions. In practice, students (especially adult students) would not permit this behavior to persist. Variance has another benefit in that the statistic is self-limiting on the upper end.
I recommend reporting variance statistics each term, along with GPA statistics if you wish. If your terms exceed six weeks and you have a mid-term, I recommend reporting at mid-term as well. Report privately to the individual instructor, and publicly with respect to academic programs, departments, colleges, etc. Additionally, report these variances each term and depict the trend line visually.
Implement an Incentive-smart Policy for Grade Disputes
When combined with the above grading plan, a properly designed grade dispute policy, one that is equitable with respect to all stakeholders, will eliminate grade disputes or will result in perhaps two or three per year for the entire institution. Even absent this grading plan, a properly constructed grade dispute policy will reduce disputes to a handful and will reduce the institutional workload for processing the few that do occur.
If you are interested, please provide us with some basic information about your program and we will send you the most appropriate skeleton for a Grade Dispute Policy statement.
The reasoning behind the policy we suggest is clear and sound. For justice to prevail over time, each party to a dispute must have comparable risk in the outcome.
Communicate the Entire Plan Up Front
A key component to making this or any plan work is full disclosure on or before the first workshop. In some cases, I would recommend that students be required to assent to having read the plan (on the Web) or initial and return a copy of the document if distributed on paper.
What You Can Expect
If implemented with practitioner-adjunct instructors, this grading and grade dispute plan will be welcomed. If implemented with full-time traditional faculty, it will be necessary to explain and justify the plan, recognizing and working through special cases. In all cases, I recommend a workshop to communicate the technical and value issues associated with this change.
If implemented as described, you will get results sufficient to take grade inflation off your executive list of concerns. If the mean undergraduate GPA is currently 3.2, you can expect it to drop to 2.7 in a year or two (depending on term length) assuming normal circumstances. If you need values to drop faster or further, I recommend creating incentives to heighten focus on this issue. Let me know if you are thinking of moving in this direction.
Robert W. Tucker is President and CEO of InterEd, Inc.
He can be reached through this forum.
The expression of other views by leaders in higher education is welcomed.





Reader Comments