Are student evaluations of teaching (SETs) biased?

Around the proverbial water cooler of many colleges and universities, you will often hear faculty discussing their student evaluations of teaching (SETs). Faculty may feel that SETs do not adequately represent the quality of their teaching, and in some aspects, they are correct. Research suggests that SETs are subject to a wide variety of biases, all of which are out of the control of the individual faculty member. Below are eight sources of bias which have been identified in research on the effectiveness of SETs for evaluating the quality of teaching. Some you may find surprising.

  1. Faculty Attractiveness. Faculty attractiveness is correlated with higher student ratings (e.g. higher teacher effectiveness, course quality, likeability, approachability and favorable grading). Why does this occur? Society tends to believe that the world is just, and that people get whatever it is that they most deserve. Thus, if someone is physically beautiful, that beauty must have been earned. The bottom line is that student’s may be giving attractive faculty more favorable SETs based on a subconscious perception that they’ve earned higher ratings.
  2. The Faculty’s Gender. As a general rule, male professors are rated by students as more effective than female professors, despite the fact that students are more likely to rate female professors as attractive. Female professors receive high overall ratings ONLY if they demonstrate competence as well as warmth, but this expectation does not hold true for male professors. This reveals that traditional gender role expectations have the potential to influence how professors are viewed by their students.
  3. The Faculty’s Race. Although inherently very difficult to study, some research suggests that race can subconsciously affect how students rate teaching effectiveness. Asian Americans can be less expressive with their body language or less assertive during conversations, which some students can interpret as cold, uncaring or shy. The more intense body language of African American professors can be interpreted as hostile. Social conditioning further affects interpretation of faculty behaviors. Perception of one’s facial expression and personality is influenced by the immediate read of that person’s race, gender, age, etc. The difficulties some racial minorities have experienced when attempting to improve student evaluations of teaching have been hypothesized to be related to some of these factors.
  4. The Faculty’s Nonverbal Behavior. In one famous experiment, a trained actor was hired to play a professor (Dr. Fox) who taught a meaningless lecture on “Mathematical Game Theory as Applied to Physical Education.” The actor adopted a lively demeanor and conveyed warmth to his audience while giving the audience contradictory statements, double talk, and made-up words. Three separate audiences of health professionals and graduate students awarded him high evaluation scores and comments full of praise. Speech patterns, facial expressions and humor can have a tremendous impact on student evaluations. Providing good examples of concepts and thoughtful use of repetition are less likely to correlate with overall scores. One could argue that nonverbal behaviors can be modified, and this is partially true for things like gesturing, moving around the classroom and making eye contact with students. But other nonverbal behaviors come from physiology, culture and habit and cannot be easily or meaningfully modified. Speech patterns are solidified by adulthood, so it is potentially problematic when fast speech is associated with competence and soft voices are associated with warmth.
  1. Students’ Grade Expectations. Some hypothesize that students who receive low grades or scores always score faculty lower on SETs. This has been shown to not be the case. The size of the mismatch between the student’s actual vs. expected grade is the true influencer. One study conducted in Korea revealed that the larger the difference between the true grade and the expected grade (e.g. from past performance reflected in their cumulative GPA), the larger the impact on SET scores. Students may be rewarding or penalizing faculty who exceed or fail to meet their grade expectations. Disseminating SETs before final exams or before grades are released could reduce this “punishment and reward” effect, but would presumably reduce how well results of SETs assess the quality of learning.
  2. Students Find the Course Dull or Anxiety-inducing. Students with negative experiences from previous coursework can display disinterest and anxiety surrounding similar courses. This phenomenon with its associated effect on SETs has been identified specifically in quantitative reasoning or math-related courses when compared to courses in humanities. Faculty who teach quantitative courses receive widely variable scores in their SETs regarding stimulating student interest and encouraging class participation. Scores on knowledge of subject matter or ability to effectively present the material do not display this variability. This finding has brought up the need to review scores on individual prompts in SETs to ensure faculty aren’t being unfairly penalized when they teach the “broccoli” courses.
  3. Students Have “Phoned it in”. Completing SETs can be cognitively demanding for students, especially when relying on reconstructive memory. This effect can be amplified in cognitively demanding academic programs (e.g. health professions). One study evaluated whether medical students were putting due diligence into their SETs. The researchers inserted the names of two fictitious gender-ambiguous lecturers into SETs for 2 pre-clinical classroom courses respectively. Students were given the option to select “N/A” for any of the 26+ faculty included in the evaluations. Two-thirds of students evaluated the teaching of this fictitious lecturer, some even providing comments on the faculty member’s teaching. Students could feel challenged to find something memorable which sets one faculty member apart from another and could develop an apathetic attitude if they fail to perceive personal reward for this cognitively demanding task. It’s also possible that the problem lies with evaluating 26 or more faculty in a single assessment; however, this study sheds some light on the general concerns surrounding how students treat their SETs.
  4. The SETs Contain Problematic Prompts. Many questions or prompts found in student evaluations of teaching focus on the instructor (i.e., “The instructor inspired critical engagement with the topic.”), which reinforces the misperception that teachers are solely responsible for student success or that teachers have control over the full learning experience. For example, what if students were enthusiastic about a topic, but the faculty member did not influence (positively or negatively) that enthusiasm? What if a faculty member encouraged group or individual participation but students did not actually participate? Questions that focus on factors for which faculty have little control, (i.e. participation, enthusiasm) should be interpreted with caution due to unmeasurable factors, like student intrinsic motivation.

Contemporary models of learning in higher education focus on de-centering instructor presence and control (e.g. no more “sage on the stage”), but student perceptions of teaching effectiveness may not be up to speed with the concept of faculty member as facilitator. Questions that focus on specific and objective instructor actions, (i.e. responding to emails within 2 business days, starting class on time) may be more appropriate.

Despite the many sources of bias identified in SETs, they are an important part of summative (and arguably formative) evaluation of teaching and are likely to continue to be relied on in administrative decision-making in colleges and universities for a long time to come. Regardless of faculty feelings about SETs, staying employed likely means that they should be carefully considered when planning coursework and teaching activities. One should also consider potential benefits of SETs which may not be inherently obvious. They help students feel more engaged with their program of study, for example. They help students learn to reflect on the overall educational process. Specific comments about faculty can also provide a window into how faculty are being received and understood by students. Interestingly, student suggestions for improvement have been shown to align with established learning theory.

In conclusion, at the faculty and course level, SETs are subject to various types of biases, which can be difficult or entirely impossible to change. Administrators should be aware of these potential areas of bias when using them for faculty job performance recommendations or decisions. Before completely disregarding particularly negative (or positive) SETs, faculty should be aware of potential sources of value in teaching improvement while taking personal and less constructive feedback with a grain of salt.

If you would like to contribute to The Faculty Development Blog, please contact Tyler Rose at

Katherine P. Smith, PharmD, FCCP, BCPS
PGY-1 Community Pharmacy Residency Director
Associate Professor of Pharmacy Practice
Roseman University of Health Sciences College of Pharmacy