District size and district per-pupil spending were the key variables in this investigation. Data for these items were obtained from a pair of related data sets published by the National Center for Education Statistics of the U.S. Department of Education. District size was obtained from the NCES Common Core of Data “Local Education Agency Universe Survey Data,” and spending figures were obtained from the NCES CCD “Local Education Agency Finance Survey” (“local education agency” is the NCES term for a school district).

Two per-pupil spending figures are commonly reported by school districts: “total spending” and “current spending.” The current spending figure is equal to the total spending figure minus capital costs, such as construction and debt service. Current spending is generally the preferred dependent variable in regression studies of school district spending, for two reasons. First, current spending is less “lumpy,” or prone to variation from year to year due to such factors as cyclic construction projects and their associated costs. Second, it is less affected by historical factors outside the control of current school district officials.[*] For these reasons, this study follows the general pattern among education researchers[†] and uses current spending per pupil as its dependent variable.

To broaden the base of evidence from which to draw conclusions, this study considers the five most recent school years for which both district size and spending data are available: 1999-2000 through 2003-2004. As discussed below, the author kept all spending and revenue data in this study in current dollars, dealing with inflationary and other period effects through dummy variables corresponding to the years of observation.

Time series data of this sort are often analyzed with a technique known as panel regression, but that technique is not well-suited to the model of per-pupil spending proposed here or to the research question — that is, How do district mergers affect per pupil spending?[**] The approach taken in this paper is therefore a pooled regression on the data for all districts over all five years, clustering together the observations we have for each district at different times. Clustering the observations by district allows us to control for correlations among those observations, which is necessary to conform to linear regression’s key assumption that the observations are independent of one another. This pooling of observations is accomplished using Stata’s “regress, cluster()” command to produce robust Huber/White/sandwich estimates of variance (and thus robust standard errors).[8]

Given that we want to isolate the effect of school district size on per-pupil spending, we must control for the impact of other factors that might independently affect spending. Those control variables are described in the sections that follow.


[*] See, for instance, Alan L. Gustman; George B. Pidot Jr., “Interactions between Educational Spending and Student Enrollment,” The Journal of Human Resources, vol. 8, no. 1. (Winter, 1973), pp. 3-23. Gustman and Pidot chose current expenditures per pupil, explaining, “We avoid the problems caused by the extreme irregularity of annual capital outlays and omit interest payments which are determined largely by past interest rates and the method of financing construction historically in a particular location.”

[†] See, for instance, Frank Johnson, “Revenues and Expenditures for Public Elementary and Secondary Education: School Year 1998-99,” National Center for Education Statistics, Statistics in Brief, 2001, publication no. NCES 2001-321. http://nces.ed.gov/pubs2001/2001321.pdf. Johnson notes, “Researchers generally use current expenditures instead of total expenditures, when comparing education spending between states or across time.”

[**] There are two main types of panel regression: fixed effects, and random effects. A fixed-effects regression would ignore differences between districts and look only at how districts themselves change over the five years for which we have data. In other words, it would explore the effects of pouring more students into an existing district, holding constant that district’s aggregate income (because we have income data for only a single year). But when two or more districts merge in the real world, the higher number of students in the combined district is always accompanied by higher aggregate income as well. Clearly, then, a fixed-effects panel regression on the data we have would answer a different question than the one we wish to investigate.

In theory, it might have been possible to address this problem by using a random-effects panel regression, because the random effects approach takes into account differences between districts as well as differences within districts over time. But a Hausman test indicates that a random-effects model is inappropriate. Hence we are left with the pooled-regression approach chosen for this study.