Today's Briefing

Its time to schedule your Executive Retreat on Responsible Growth. Initiate a conversation now.

Wednesday
Mar032010

« End of Course Surveys, Part II: Executive Actions »

This multi-part Executive Briefing was prepared exclusively for senior decision-makers. The perspective taken in this Briefing assumes the reader’s progressive organizational experience leading to a senior position in a college or university setting.

Part I examined the typical objections to end-of-course assessments and suggested that valid concerns regarding the end-of-course assessment process can be addressed such that benefits far outweigh risks. Part II identifies specific executive considerations and steps in creating an end-of-course assessment system that benefits your institution in unique ways.

As with other Executive Briefings where multiple issues of some complexity are involved, I will focus on assertions of potential value to you. Additional detail and justification is always available via email or a phone conversation.

Establish Value for the Process

The first and most important step in establishing a valid or useful end-of-course assessment system is creating a positive cultural and organizational context. (The terms 'valid' and 'useful' are closely related in measurement science.)

Many of the failures in implementation of an otherwise good system can be attributed to a lack of executive commitment coupled with a lack of perspective sharing and education for everyone in the value chain. The president and chief academic officer must voice a strong and unambiguous commitment to the process as an important component of the institution’s quality management system. Deans and department heads should follow suit. Instructors and/or proctors should receive training to understand that the usefulness of the process rests on the affirmative commitment they convey to students regarding the process. While I do not believe in scripting, instructors (or proctors) should initiate each end-of-course assessment with a note of seriousness, explaining that the institution looks to thoughtful, objective, and constructive student judgment as one of the most important parts of the university’s quality management processes. Students should be encouraged to reflect carefully on the questions and on constructive responses to them.

If you succeed in establishing and maintaining value for the end-of-course assessment process, other impediments will be easier to resolve or will resolve themselves over time.

Neutralize Myth with Fact

Much like the welfare recipient arriving in a brand new Lexus to collect his check, myth and apocrypha overshadow the legitimate concerns one must address in developing an end-of-course assessment system. Many instructors will assert a deep knowledge of the inner workings of assessment systems. Most, however, base their beliefs on their worm’s eye view of the process, where small and idiosyncratic samples, combined with primacy and recency biases, color and skew their generalizations.

We have heard what must be all of the objections and assertions countless times from countless well-intended instructors. While there is merit to be found in individual perspectives, the foundation for the generalizations and recommendations offered here rests on analyses of more than 10,000,000 assessments, 500,000 instructors, and 3,000,000 classified open-ended comment tokens.

I’ll give you one example of common myth in relation to the objective truth before proceeding.

One-third to one-half of your faculty are likely to believe that the ratings students assign to them on end-of-course assessments correlate highly with the grades they award. High grades, high ratings; low grades, low ratings. Aside from the fundamental disrespect and lack of authenticity created when instructors hold this view, the view is false . . . by a wide margin. The largest study I performed on this relationship involved 85,000 end-of-course assessment records paired with their corresponding grades. The strongest r2 we were able to achieve (results vary somewhat based on how the variables are loaded) was 0.22. Interestingly, when we examined the individual records contributing to the 0.22 correlation (i.e., those where the correspondence between grades and instructor ratings were highest), we found a very clear pattern of ineffective teaching and grading behavior. In other words, ineffective teachers who are also ineffective in their role of making valid distinctions among levels of student performance are at higher statistical risk of seeing a correlation between grades and ratings. Even here, however, the r2 values were still below 0.50, which demonstrates only a weak relationship. Scientifically speaking, the belief that grades and instructor ratings correlate highly is false. False also is the belief that high ratings can be “purchased” by awarding high grades. When such a relationship is stronger with respect to an individual instructor, one is likely to find an ineffective instructor.

Myth can be incapacitating. An executive who lends credence to myth and unscientific bias surrounding the end-of-course assessment process, however well presented by faculty, will find it difficult to implement a useful end-of-course assessment system.

Allow Sufficient Classroom Time

Many end-of-course assessments are crammed into the last few minutes of the last workshop when students are rushed and attentions lie elsewhere. While the assessment can and generally should be administered in the last workshop, time for the process should be scheduled and aggressively protected by policy. Working adults require no more than 7-10 minutes to complete their assessment, even less in subsequent courses as they become familiar with the instrument. Some instructors point to this fact as evidence of inattentiveness – another myth – when in fact the measures of internal consistency (alpha coefficients for scales and other measures) demonstrate that students pay more attention and are more consistent in their evaluations as they progress through the program, gaining experience with different instructors and topics.

Require Participation

Within normal bounds, students should not be permitted to opt-out of participating in the end-of-course assessment system. The rationale is simple but we can discuss it offline if you have questions. Ninety percent participation rates are the standard in classroom-based end-of-course assessment systems managed by our firm.

We recommend online end-of-course assessment systems even for classroom based programs because they simplify the process. Online assessments require strong policy and persistent education and persuasion. There are a range of common policy-based requirements progressing from open to fairly restrictive. By implementing a level comfortable for your institution, you can achieve response rates from an unacceptable 60% to a virtually perfect 99% for your assessments. For online courses, I recommend a floor of 85% participation. Again, get in touch with me if you have difficulty constructing an effective policy to support this target.

Secure 95% participation for on-ground courses and 85% for online courses. Take immediate steps to correct shortfall.

Encourage Comments

As we will discuss later, over time, students’ comments become the most useful component of your end-of-course assessment system. Adequate time to complete the survey, coupled with encouragement, will ensure that students provide actionable information in response to semi-structured open-ended questions (e.g., “What did you find particularly helpful about this instructor’s teaching practices?”). Later, I will provide evidence as to the accuracy and usefulness of aggregated comments.

Construct Instruments to Measure Variance of Interest

Above all, avoid building an end-of-course assessment instrument with a committee or a task force of representative members of the faculty. I have reviewed more than a thousand such instruments in the past 25 years and none so constructed pass standard tests of validity. (Recall that validity is determined in a specific context; it is a property of that context, not a property of the instrument.)

With respect to the scaled questions, there will typically be four sections: instruction (5-7 questions), content/curriculum (3-5 questions), other students/learning environment (2-4 questions), and support services (special structured comment section or pick list on web survey). These designations are for your benefit and are typically not the language used in the surveys.

With respect to instruction, there are five dimensions of measurement: subject matter expertise, management of classroom (time, focus, etc.), ability to show connections between content and workplace or other application target, feedback on student performance (adequacy, volume, constructiveness), and proficiency in assigning grades that reflect achievement. Again, these designations are for your benefit and are typically not the language used in the surveys.

With respect to content and curriculum, the choices rest on your institution’s method for developing and managing content. Send me questions and specifics if you would like some suggestions. Key here is learning how well the curriculum supports application and integrates various levels and types of learning.

With respect to other students and the learning environment, you are looking to see if horizontal learning is in place, tacitly or explicitly, and if the overall environment facilitates or inhibits learning. This is also the place where you will want to assess the online learning platform and its embedded pedagogy. (You do have explicit pedagogical standards embedded, right?)

With respect to support services, there are ways to handle this potentially large field with a comments section that provides training prompts. You may wish to list the major areas of support and ask for constructive comments with respect to any area that was used during the span of this course. Frame the questions, “What did you find especially helpful?” and “In what way would you suggest that this service be improved?”

Employ a short survey, designed by experts to focus on variables known to correlate with an effective learning environment and positive learning outcomes.

Use It or Lose It

Designing survey questions can be fascinating. You may find yourself overwhelmed with offers of assistance from self-styled experts. Among instructors, there is a tendency to invent dozens of questions that sound reasonable on their face but will produce useless or invalid data that no one will use or should they consider using. When it comes to designing instruments, faculties don't know what they don't know and tend to fill the knowledge vacuum with useless questions. (I cannot recall seeing an end-of-course survey that was too short but most are too long.) The most effective way to deal with the "self-styled expert" problem is to require that potential questions be accompanied by detailed plans about how the information they produce will guide important decisions. Your rule: "No plan of use committed to, no question." Few well-designed end-of-course assessment instruments contain more than 16 scaled questions and four comment frames in which students are asked to comment on the thing most helpful and thing most in need of improvement in the areas of instruction, curriculum, environment, and services. It is not the compact size that makes the instruments useful but, conversely, as the number of questions increase, validity can decrease in various ways, including student attention.

Ensure that an explicit plan of use is in place with respect to information gathered by the end-of-course assessment process. Eliminate questions that produce information no one is using.

Part III will continue to identify specific executive consideration and steps in creating an end-of-course assessment system that benefits your institution in unique ways.

Robert W. Tucker is President and CEO of InterEd, Inc.

He can be reached through this forum.

The expression of other views by leaders in higher education is welcomed.

Reader Comments (6)

I especially appreciate the discussion of instrument validity in this briefing.

"Interestingly, when we examined the individual records contributing to the 0.22 correlation (i.e., those where the correspondence between grades and instructor ratings were highest), we found a very clear pattern of ineffective teaching and grading behavior."

This is intuitively true: the instructors that run the course like a beauty contest, but with no ugly students, are also those whose teaching methods not the best, and/or lack valid assessment methods. The result is that everyone 'wins'.

But this in no way contradicts the "myth" of correspondence of high ratings with high grades, especially when you consider a marginal 2-year institution that garners the majority of local college students leaving high school and frequently fields out-of-field teachers in the classroom. The more teachers that fit this picture at the school, then the more pervasive the correspondence between grades and popularity. Those teachers that break the unspoken compact between teachers and students (that homework will be at a minimum, that classes will be high school, etc.) will quickly learn of student dissatisfactoin.

It may be, too, that the adult students know that this survey can hurt or help a teacher keep their job, and Likert accordingly.

Others, in a traditional face-to-face classroom, do not like the teacher, and do not like the grade they received. These experiences will, I would argue, have an impact on survey results.

"With respect to the scaled questions, there will typically be four sections: instruction (5-7 questions), content/curriculum (3-5 questions), other students/learning environment (2-4 questions), and support services (special structured comment section or pick list on web survey). These designations are for your benefit and are typically not the language used in the surveys."

These are excellent suggestions, but I wonder about the extent to which responses are influenced by previous experience and expectation. Since surveys are instruments that intentionally foreground situational elements, what will be pushed into the background -- even if only for the 7-10 minutes that the survey takes to administer?

I look forward to hearing about the ways that survey results can then be incorporated into a quality improvement cycle.

Mar 5, 2010 | Unregistered CommenterGlen S. McGhee

Clarifications:

1. The r2 of 0.22 just is the contradiction of beliefs that a strong relationship exists between grades and ratings. Eighty-five thousand records gathered over several years, programs, locations, and states, with no systematic selection bias (essentially all available records), revealed no relationship between grades awarded and instructor ratings. A smaller study in which both grades awarded and pre-grade student judgments of the grade they believed they would receive also failed to show a relationship between either the actual or estimated grade and instructor ratings.

2. It may not have been clear but the examination of individual records was just that. It was a case-oriented examination of a biased subset of the database; i.e., cases, statistically few that they were, where the correspondence between grade awarded and instructor evaluation was high. The rules of causality prevent this information from flowing backwards to the statistical finding. It is an interesting observation and nothing else.

3. One of many studies on this topic involved 18,000 students in face-to-face classrooms. The findings were the same: r2 in the low to mid 20's (I believe 0.28 was the highest); no meaningful relationship between grades awarded and instructor evaluations.

4. One of the larger points of this series is to point out that, when it comes to opinions on the merit of end-of-course surveys, like noses, everyone appears to have one. Over the years, we have tried to hold firmly to our own preconceptions. Unfortunately, some of them have been soundly thrashed by the empirical findings.

5. We feel it is important to keep in mind that InterEd's mission and the guidance provided on this website are focused on adult-centered and professional programs and therefore working adult students. We have nothing to say about youth-centered programming except to note that, increasingly, the 17-year-olds are behaving like the adults, especially with respect to service expectations.

6. This series is designed to place executives on sound conceptal and empirical footing in these matters. More broadly, it attempts to modulate personal beliefs, even personal experiences, with the patterns that emerge when one studies millions of records for decades.

Finally, we cannot help but wonder why there is so much resistance to appreciating students as rational, fair minded individuals. Yes, some students can be churlish, and so can some professors. On balance, however, students and professors alike tell the truth and do the best they can to make sound judgments. They do not want to see themselves as biased or hypocritical and they struggle, as we all do, to make sound judgments.

What drives the all-too-common penchant for disrespecting their integrity by explaining what appears to be fair minded behavior under more sinister motives? Did we behave this way when we went to college or are we suggesting that, somehow, students "aren't who they used to be?" We can recall a few situations where our lowest grades came from respected and still fondly remembered instructors. We would like to think we evaluated them fairly although, to be honest, no one around here can recall those distant details. Here's hoping . . . and here's hoping that we can accord today's students the dignity that InterEd's findings suggest they deserve.

- Staff

Mar 9, 2010 | Registered CommenterInterEd, Inc.

Point 5, I think, was very important for distinguishing the bi-modal norms I found.

Here is the current survey, shorter than before. What do you think?
http://gulfcoast.xitracs.net/sacs/submission/documents/357.pdf

Mar 13, 2010 | Unregistered CommenterGlen McGhee, FHEAP

A few suggestions:

1. For each question, in a group process, ask yourself what a concrete CQI plan would look like if it were to be based only on the answer to that question. Some questions may fall in the "nice to know if you're curious" category.

2. Eliminate Q17 and Q18. Roll up questions produce spurious results (what does it mean to be an excellent teacher). You want to ask specific questions about each functional area. For instruction: did he manage classroom time well, did he provide adequate feedback on work products, etc. When you ask the 4-5 critical instruction questions, then ask for a roll-up , you get means for the roll-up that are higher with lower SDs than the aggregate of the individual questions. If you must have a roll-up (I discourage it because it creates a false sense of understanding; there is a strong tendency to focus on it to the exclusion of critical profiling), construct one after the fact in SPSS.

3. The greatest area in need of improvement is the comment section. For each functional area, you need to ask what the student found helpful and, separately, what kinds of improvement might be made in that area. These open-ended responses to bounded question frames need to be profiled. Over time, they are much more valuable than the means and SDs, etc. The latter being useful only at the first level of reporting (i.e., to that instructor for that course).

- If I give you five years of means and standard deviations for 100 instructors and 5,000 courses, you will not be able to distinguish the best from the worst instructors or how or why any are different.

- If I give you five years worth of comment profiles, they will be crisp, distinctive, and valuable in terms of coaching, best practice sharing, remediation, and identifying the critical success factors in the learning environment.Every instructor will possess a distinctive profile that reflects who he is in the classroom.

4. Some questions convey a bias for particular levels of learning. Analysis and critical appraisal (Q11) are in the upper third of most learning taxonomies. These activities make sense, for example, in a senior accounting seminar but are inappropriate in a fundamentals of accounting course where the task is to remember and be able to act upon basic definitions. Focusing on them in such a course will get you into trouble and interfere with eventual success. Similar comments can be made about creative capabilities and actions (Q6). There are others. Q10 is patronizing and irritating to a 35 year old manager whose values are as well positioned as yours or mine. For the 17 year old, the answer when repeated in various contexts will produce an alpha under 0.40. Why do you want it?

Designed by a committee. Right?

Mar 15, 2010 | Registered CommenterInterEd, Inc.

Not sure where the questions came from.

I especially value the more specific questions regarding the "job" that the teacher has -- planning and execution of classes. This may help to move off the "beauty contest" aspect and away from rating personalities, which is one of the key reasons faculty hate any kind of student survey.

Maybe the differences (see above) in student survey's relate more to the quality of the survey itself, although, again, even here a "rational consumer of educational services" would be best. It is just that, a poorly designed survey can produce a seemingly irrational consumer.

May 31, 2010 | Unregistered CommenterGlen S. McGhee

My experience with the comments section was that they were simply passed on to the faculty. The chair might have skimmed through them, but there was no discussion, no need to cull for clues on improvement.

In fact, the situation was such that any suggestions made about improvement would create tension, were not handled well, and probably were wrong-headed to begin with. That is, there was no census anywhere on what improvement was (how measured?), or whether it was even desireable. It simply wasn't a priority.

May 31, 2010 | Unregistered CommenterGlen S. McGhee

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>