assessments are defensible for evaluating teacher effectiveness, student test
scores need not be the only measure of teacher quality. Principal and vice
principal evaluations can also help pinpoint good teaching, and policymakers who
face resistance to value-added assessment may want to consider offering to
include supervisor evaluations as well. As a practical matter, however, many of
the same groups that unremittingly point out flaws of value-added measurements
also argue that supervisor evaluations are biased and capricious.
Yet principal or vice principal evaluations are superior to peer evaluations or parent evaluations, which are more likely to suffer from subjectivity.[*] Research findings also suggest that principals are capable of measuring teacher
A recent RAND Corp. working
paper on merit pay by Richard Buddin and colleagues lists some potential
limitations to supervisor evaluations of worker effectiveness.
The researchers explain that it can be difficult to correct for the inherent
subjectivity of any performance evaluation that involves individual supervisor
judgment. They add that problems can also arise when workers perceive favoritism
and that a subordinate’s personality or demographics can interfere with
supervisor objectivity. They also note that supervisors may be hesitant to judge
performance accurately out of fear of reprisals from disgruntled workers.
Finally, they write, "Compression of scores or rankings towards the upper end of
the distribution is likely to occur when evaluations are used as part of a pay
Buddin et al. also refer to a recent study of principals’ ability to evaluate
teacher performance by Brian Jacob of the University of Michigan and Lars
Lefgren of Brigham Young University.
Jacob and Lefgren asked principals in an unidentified Midwestern school district
to rate 202 teachers of core subjects during the 2002-2003 school year in grades
two through six on a scale from one to 10 on a number of different traits
traditionally seen as related to teacher effectiveness, such as classroom
Jacob and Lefgren also calculated the student achievement test score gains for
each teacher. Then they compared principals’ ratings of effectiveness to actual
effectiveness as measured by student achievement gains. They found that
principal ratings and value-added calculations were roughly equal in identifying
the most and least effective teachers, but that principals were less able to
differentiate effectiveness in the middle of the teacher quality distribution.
They also examined the extent to which a teacher’s education and experience,
which are the basis of the single salary schedule, are good predictors of
student achievement growth. On this question, they found that education and
experience were inferior predictive measures of teacher quality.
Interestingly, Jacob and Lefgren found that principal evaluations were better
predictors of parent preferences for specific teachers than were the teachers’
value-added achievement measures, years of experience, education or
compensation. While this finding could be taken as a sign that principals and
parents are equally "wrong," the finding probably indicates that principals
perceive teacher characteristics that parents tend to value, even though these
characteristics may not be measured by standardized tests.
Despite the fact that principal ratings are good indicators of teacher
effectiveness in the classroom, Jacob and Lefgren are careful about recommending
the use of this rating mechanism. They note that their experiment was carried
out in a setting in which principals did not face job pressure to identify
effective teachers. They explain that the effect of a higher-stakes environment
is unclear: While the increased importance of the evaluation might motivate
principals to be even more accurate, it might also make them reluctant to assess
teachers honestly for fear of reprisals.
(Principals’ evaluations were kept confidential and not made available to the
Jacob and Lefgren also found that principals, regardless of their own sex,
routinely discriminated against male and untenured faculty. They wrote:
"Specifically, principals rate both male and untenured teachers roughly 0.3 to
[0.5] standard deviations lower than their female and tenured colleagues with
the same actual proficiency."
They offered a lengthy set of possible explanations for this discrimination
without any firm conclusion, but stated, "Regardless of the cause, however, this
discrimination may place male and untenured teachers at a disadvantage in a
system that relies more heavily on principal assessment."
Ultimately, this and the study’s other findings indicate that although principal
evaluations may have drawbacks, they can help identify good teachers.
Recent research findings by Douglas Harris and Florida State University’s Tim
Sass also suggest that principal evaluations can help identify teacher quality.
In a 2007 study, Harris and Sass compared principals’ private ratings of
teachers in an anonymous Florida school district to value-added calculations of
The 30 principals included in the study spanned elementary, middle and high
school grades. Harris and Sass wrote, "We find a positive and significant
correlation between teacher value-added and principals’ subjective ratings and
that principals’ evaluations are generally, though not always, better predictors
of a teacher’s value-added than traditional approaches to teacher compensation
that focus on experience and formal education."
Like Jacob and Lefgren, Harris and Sass advised caution in the use of principal
evaluations for use in teacher accountability or reward systems; they do not
dismiss this possibility, however.
this research suggests, principals are generally capable of evaluating teacher
effectiveness. Principals’ input can be used as a supplement to value-added
assessment and to help address concerns over value-added measures of teacher
[*] For an argument on site-based management reform, see Angus McBeath, “The Edmonton Public Schools Story: Internationally Renowned Superintendent Angus McBeath Chronicles His District's Successes and Failures” (Mackinac Center for Public Policy, 2007), www.mackinac.org/archives/2007/s2007-13.pdf (accessed May 18, 2008).
[†] In a recent report on teacher evaluation systems, Thomas Toch and Robert Rothman of Education Sector, an education policy think tank in Washington, D.C., raise concerns about the current methods of measuring teacher quality (see Thomas Toch and Robert Rothman, “Rush to Judgment: Teacher Evaluation in Public Education” (Education Sector, 2008),
June 26, 2008)). In particular, Toch and Rothman criticize the common practice
of having a single supervisor assess teacher performance through a single