Using Value-Added Assessment to Define Teacher Quality

The central premise of "value-added assessment" is that it is possible to measure the contributions that a teacher makes to a student’s academic achievement gains. Value-added measurement of teacher quality is not an original idea, even though the statistical tools to perform such calculations have been around for a relatively short time. Many scholars, with varying political perspectives, recognize the benefits of using value-added measures of teacher quality.[123]

Value-added assessment provides us with insights that a simple look at student test scores alone cannot. If, for instance, we were simply to evaluate teachers on the absolute performance levels of students, we would unfairly punish teachers assigned to classrooms with weaker students. For example, using absolute performance levels, we would be forced to say that a teacher with students who score at the 55th percentile on standardized tests is more effective than a teacher with students at the 35th percentile. Through value-added analysis, however, we might show that a teacher who raises average student achievement in her class from the 25th percentile to the 35th percentile is more effective than a teacher whose students perform consistently at the 55th percentile.

The calculations involved in determining the value added by a teacher sift out a variety of factors that contribute to student performance but are unrelated to the teacher’s contributions. For example, it is well-established that a student’s individual characteristics, such as family income, demographics and English language proficiency, tend to affect his or her success. As we describe below, value-added statistical models can control for these demographic factors. The models can also control for the influences of a school or of classmates on a student’s performance. The goal is to isolate that part of a student’s performance gains that result from his or her teacher’s skill and effort through the course of a year.[*]

The National Council on Teacher Quality, a Washington, D.C.-based nonprofit organization that conducts research on teacher quality, lists four ways that using value-added measures can help to promote effective teaching. They note its usefulness in "Identifying professional development needs; Evaluating teachers, provided other criteria are considered as well; Awarding individual bonuses, provided other criteria are considered as well; and Providing the objective data needed for dismissal of an ineffective teacher."[124]

The logical question is, If value-added statistical models are so helpful, why are they not widely known and used? One answer is capacity. As Daniel McCaffrey et al., statisticians at the RAND Corp., note, value-added modeling "requires extensive computing resources and high-quality longitudinal data that many states and districts currently do not have."[125]

Such concerns have led many organizations — including the NCTQ, mentioned above — to include caveats when advocating the use of value-added measures to evaluate teacher effectiveness. Their hesitation is echoed by numerous key teacher quality scholars who are reluctant to say that teachers should be assessed primarily by how their students perform.[†] Referencing Daniel Koretz of Harvard University and McCaffrey et al., Harvard University’s Murnane and Steele note that value-added methodologies have problems with missing data, teachers with small sample sizes of students, an absence of standardized testing in some grades and subjects, a difficulty in separating teacher effects from classroom or school effects, and most of all, a challenge in estimating "what would have happened to the students’ achievement under an alternative scenario."[126] In addition, some researchers (for example, Ballou) raise a legitimate concern about the measurement error, or "statistical noise," that exists with any statistical measurement of student achievement and that becomes worse when more than one exam for a particular student is involved.[127]

It is true that the statistical operations employed to measure teacher contributions to student achievement are not perfect, but it is possible to address many of the concerns about value-added modeling and make useful calculations. Regarding missing data and teachers with small sample sizes, it may be necessary to collect data on some teachers over two or three years to get enough data to make an accurate assessment. The value-added evaluations for new teachers would likewise require waiting for a couple years before making an assessment of the type of teacher a novice will become. (A point of clarification: Value-added assessment typically involves measuring how students improve from the beginning of a single academic year to the end. Including several years of data for a particular teacher does not mean comparing the achievements of one set of a teacher’s students to those of the next, but rather combining several years of data on students’ single-year improvements under that teacher.)

There are also ways to address the fact that students are not tested in every grade and subject either at the state or local level. Under the state’s current testing regime, it would be impossible to evaluate individual teacher contributions in certain grades and subjects, since Michigan only tests students annually in grades three through eight in English and math and less frequently in science and social studies, as Graphic 10 shows.[128]

Graphic 10 — click to enlarge

Adding annual testing in grades K-2 and 9-10 and in a wider array of subjects would be useful not only for measuring teacher contributions to student learning, but also as a diagnostic tool for improving student achievement.[**] The decision to expand the range of grades and subjects tested can be made at the local district level, as the testing requirements imposed by NCLB and the MDE provide a minimum level of testing, not a maximum.

And there are basic advantages to requiring testing each year. As Robert Gordon, Thomas J. Kane and Douglas O. Staiger write in a report for the Hamilton Project, a program of the nonprofit Washington, D.C.-based Brookings Institution, one unintended consequence of limiting value-added measurement of teacher performance to only the currently tested subjects could "create unhelpful incentives for low-achieving teachers to leave the tested fields and high-achieving teachers to enter them."[129] Although Gordon, Kane and Staiger think this consequence is unlikely given that currently only a few subjects are tested,[130] this unintended consequence cannot be summarily dismissed. Value-added assessment of teacher performance would raise the stakes considerably for teachers. The perverse incentive for poor teachers to avoid teaching subjects where accountability and performance measurement actually matter for their job security becomes a real possibility. Obviously, this negative outcome is not the only reason to consider adding tests in all grades and subjects, but it is yet another important reason to consider doing so.

Hence, it is valuable to consider ways to extend the use of traditional standardized testing to grades and subjects not currently tested. That said, there are some subjects and teaching settings in which the addition of traditional standardized testing is impracticable. Thus, Gordon, Kane and Staiger are right to suggest that teacher evaluation should be expanded to include a high-quality assessment of any teacher in a grade or subject for which traditional standardized testing may prove unworkable. They recommend the use of "Connecticut’s Beginning Educator Support and Training (BEST) program, in which new teachers submit portfolios of their work, including lesson logs, videotaped segments of teaching, examples of student work, and reflective commentaries on the goals during lessons."[131] Perhaps some of the measures included in the BEST system could be helpful — namely principal evaluation — but policymakers should be wary of unintended consequences with these performance measures. It is not particularly clear what can be gained by analyzing lesson logs or reflective commentaries on the goals during lessons.

Researchers do have reasonable concerns about the statistical errors that exist in getting exact point estimates for the value teachers add to student achievement, but the very researchers who point out the flaws in these methodologies have nevertheless continued to use them in their own research. Value-added methodologies are also employed by leading education researchers at Stanford’s Hoover Institution, the Brookings Institution, the Harvard Graduate School of Education and the University of Wisconsin. Just as leading education researchers do not let the limitations of these otherwise powerful methodologies invalidate their research claims, reformers who want to employ a more objective measurement of teacher performance as a way to improve student achievement should look to value-added methodologies as a better way of measuring individual teacher quality.

In fairness to researchers who have raised concerns about value-added methodologies in assessing teachers, it is true that value-added models are usually best at detecting the best and the worst teachers, rather than accurately sorting those teachers whose performance lies in the middle range. But given this concern, one reasonable response would be to sort teachers into three broad categories of value-added achievement and compensate them accordingly.

Currently, the Michigan Department of Education does not calculate teacher value-added measures. The MDE is exploring ways to include some version of this concept in determining "adequate yearly progress" under the requirements of the No Child Left Behind Act, but the models that the department is considering would not allow calculations of individual teacher contributions to student learning. At this point, the MDE does not have identifiers in its accountability database that would connect students to their teachers in a given year. Still, the MDE already collects critical individual student information, including student race, gender, poverty status and English language learner status. The MDE also has information about individual teachers and their years of experience and preparation.

Since the MDE does not currently intend to calculate individual teacher value-added, local districts would have to commit data analysis resources to this, or policymakers would need to adopt legislation to direct the MDE to do so. For state officials to undertake this task, teacher and student identifiers would need to be created to allow for the linking of students to their teachers each year. Depending on the exam chosen, yearly scores might have to be adjusted before they could be used in a value-added model. This is largely a mathematical exercise, however, and neither of these steps would be particularly labor- or resource-intensive. In fact, districts and charter schools could expand the use of standardized assessments, just as many conventional districts, charter schools and independent schools already have. For example, the Northwest Evaluation Association reports that more than 100 Michigan districts, charter schools and private schools already contract with the association for test assessments the schools administer in addition to the state-mandated Michigan Educational Assessment Program exams.[132] As of 2008, according to an NWEA spokesman, the association charges approximately $14 per student for these assessment services,[133] so schools interested in regular assessments should find the costs manageable.

Finally, those overseeing a value-added assessment would need to adopt and run the statistical model that would produce estimates of teacher effects. These activities would require expertise and time.

The MDE already has the technical know-how to supervise such a project, though the department might need to hire an additional person to oversee it. Obviously, if value-added assessments were orchestrated through the MDE, the state Legislature would have to pass legislation authorizing the project and its key features.

Alternatively, a school district might implement the project on its own. Individual districts may or may not have the expertise to undertake such statistical modeling, so they might need to hire technically skilled personnel or contract the work to a consulting firm. For medium to large districts, a skilled full-time employee devoted almost exclusively to this project would probably be the more cost-effective option and could likely be hired for around $100,000. Districts can also contract with private research firms that already conduct such assessments. The money to pay for these improvements in teacher and student assessment could be shifted from other less effective teacher quality programs, such as perfunctory professional development activities (see the "Limited Role of Professional Development)," without increasing total spending.

The larger obstacle to value-added assessments would be collective bargaining agreements. A number of Michigan districts have acquiesced to contract clauses preventing the districts from using student achievement to help determine teacher compensation.

Inevitably, such clauses help ensure that teacher pay is governed by a single salary schedule and that any increase in compensation will occur as an across-the-board pay hike. Since unions almost always prefer these across-the-board hikes, districts without contract clauses prohibiting the use of student achievement in calculating teacher compensation will probably still face union opposition to the implementation of value-added assessment.

Regardless, as noted earlier (see "Across-the-Board Salary Increases)," across-the-board pay hikes can actually discourage improvements in teacher quality and exacerbate shortages in understaffed subject areas. School boards intent on using value-added assessment to get the best teachers into the classroom may face stiff opposition at the bargaining table, but they will be pursuing a goal that can directly improve how their students learn in the classroom.

[*] Such methods involve statistical regression. For a highly technical discussion of value-added modeling, see Daniel F. McCaffrey et al., “Models for Value-Added Modeling of Teacher Effects,” Journal of Educational and Behavioral Statistics 29, no. 1 (2004). The index of Gordon, Kane and Staiger, “Identifying Effective Teachers Using Performance on the Job” offers a somewhat less-complicated presentation of the statistics involved.

[†] Concerning objections to measuring teachers by student test scores, Helen Ladd writes: “More generally, why does it make sense to try to hold either teachers or schools accountable for the performance of students? Would it not make more sense to try to make the students themselves more accountable for their performance?” She has a point; holding students more accountable might well make sense, perhaps through exit exams and gateway tests. This topic is not the purpose of a primer on teacher quality. Nevertheless, tracking the success of students is important in holding teachers accountable, too, and it is a useful tool in determining teacher quality. See Helen F. Ladd, Holding Schools Accountable: Performance-Based Reform in Education, 11.

[**] Ideally, this testing would occur at the beginning and the end of the school year, though testing just once a year is also possible, with the difference between a student’s scores between the two years representing the gain (or loss).