4 Empirical Strategy We now turn toward estimating the causal impact of the Adams Scholarship on students’ college outcomes. Comparing out...
4 Empirical Strategy We now turn toward estimating the causal impact of the Adams Scholarship on students’ college outcomes. Comparing outcomes of those eligible and ineligible for the Adams Scholarship would confound the impact of the scholarship with the fact that eligible students have higher academic skill than ineligible ones. We eliminate this source of omitted variable bias by using a regression discontinuity design that compares students just above and below the eligibility thresholds. Students just above and just below these thresholds should be similar to each other except for receipt of the scholarship. Though the scholarship may incentivize students to raise their test scores and qualify for the aid, there is little scope for manipulation of test scores around eligibility thresholds 18In Section 6, we describe the average cost trade-off calculations and show that despite initial savings, there are earnings losses even larger than the savings.
11 for three reasons. First, the earliest cohorts of students took their MCAS exams prior to the announcement of the Adams Scholarship. Second, at the time of test administration, the district-level 75th percentile threshold is impossible for individual students to know precisely. Third, exams are centrally scored and raw scores transformed into scaled scores via an algorithm unknown to students, their families or teachers. Figure 2 provides a graphical interpretation of scholarship eligibility in three types of school districts. In each type of district, the straight line with a slope of negative one represents the cutoff that determines whether a student’s total MCAS scores (math + ELA) places her in the top 25% of her school district. The W-shaped boundary defines the region in which students have scored “advanced” in one subject and “proficient” or “advanced” in the other. In low-performing districts with 25% cutoff scores of at most 500, that cutoff is so low that passing the proficient/advanced threshold is sufficient (and necessary) to win a scholarship. In medium-scoring districts with 25% cutoff scores between 502 and 518, that cutoff and proficient/advanced threshold interact in a complex way. In high-performing districts with 25% cutoff scores of at least 520, that cutoff is so high that passing it is sufficient to win. Scholarship winners are those students whose test scores thus fall in the shaded region of the graph. We note here that MCAS scores have risen dramatically since the inception of the program, as shown in Figure A.5. Because so many students pass the proficient/advanced threshold, relatively few districts in our sample are low-performing as defined by Figure 2. In other words, it is the top 25% boundary that is generally of the greatest importance, which can be seen by the fact that a full 25% of students qualify for the scholarship each year. There are many strategies for dealing with multidimensional regression discontinuities, as discussed by Reardon and Robinson (2012). Examples of such situations in the economics of education include Papay et al. (2010, 2011a,b). We collapse the discontinuity into a single dimension by defining for each student the distance of her math score from the minimum math score that defines eligibility, given her school district and ELA score. In Figure 2, this can be thought of as the horizontal distance between the point defined by each student’s pair of test scores and the dark 12 line defining the eligibility threshold in her school district.19 We use raw scores rather than scaled scores in defining the running variable for two reasons. First, the raw scores are a finer measure of skill than the scaled score bins into which they are collapsed. Second, we observed extreme bunching in values of the scaled scores, particularly around the values that define the proficient and advanced thresholds.
This bunching is driven entirely by the way that Massachusetts assigns groups of raw scores into scaled score bins, as the raw scores themselves have the extremely smooth distributions seen in Figures A.6 and A.7.20 As a result, the density of the running variable shown in Figure 3 looks largely smooth, suggesting little scope for endogenous manipulation that would violate the assumptions underlying the regression discontinuity design (McCrary, 2008). We do, however, see a small spike at zero itself, which is driven by the fact that a district’s 75% threshold is mechanically more likely to fall on test scores that are more common in that district. Figure A.8 is consistent with this fact, showing that no such spike occurs in the low-performing districts for which only the proficient/advanced threshold, and not the 75% threshold, defines the boundary.21 Though the spike is small and not driven by endogenous manipulation of the running variable itself, we later show that our central results are robust to and even strengthend by excluding students directly on the boundary, in a so-called “doughnut hole” regression discontinuity.
To estimate the causal effect of the Adams Scholarship, we use local linear regression to estimate linear probability models of the form: Yijt = β0 + β1Adamsijt + β2Gapijt + β3Gapijt × Adamsijt + ijt. (1) where Gapijt is the running variable described above and Adams is an indicator for Adams Scholarship eligibility (Gapijt ≥ 0).22 The causal effect of winning the Adams Scholarship on an out19Our results are robust to defining the running variable as the vertical distance, the distance of each student’s ELA score from the minimum ELA score that defines eligibility, given her school district and math score. 20Goodman (2008) characterized each student by the minimum of her scaled score distance from the proficient/advanced and top 25% thresholds. Distance to the top 25% threshold is not an easily defined quantity when raw scores are used because the straight line boundary observed in Figure 2 becomes quite jagged. We therefore prefer the running variable described in the text above. Estimates using the running variable as defined in Goodman (2008) are, nonetheless, quite similar to those presented here and are available by request from the authors. 21Figure A.9 show very similar patterns for the 2005-08 sample. 22We use linear probability models here and in our later IV regressions rather than limited dependent variable models 13 come, Yijt, should be estimated by β1 if the usual assumptions underlying the validity of the regression discontinuity design are not violated. Assuming that treatment effects are homogeneous along different parts of the eligibility threshold, this coefficient measures a local average treatment effect for students near the threshold, weighted by the probability of a given student being near the threshold itself (Reardon and Robinson, 2012).
Our preferred implementation uses local linear regression with an triangular kernel that weights points near the threshold more heavily than those far from the threshold. We compute optimal bandwidths following the procedure developed by Imbens and Kalyanaraman (2012), which trades off precision for bias generated by deviations from linearity away from the threshold. Across nearly all of our outcomes and samples,
the optimal bandwidth generated by this procedure falls somewhere between 10 and 15 raw score points. For simplicity and ease of defining a single sample across outcomes, we choose as our default specification a bandwidth of 12. We then show that our results are quite robust to a wider set of bandwidths, to inclusion of demographic controls, to inclusion of school district by cohort fixed effects, and to use of parametric specifications, including polynomials of various degrees. We cluster standard errors by 12th grade school district in all specifications in order to account for within district correlations in the error term ijt. As further reassurance of the validity of the discontinuity design employed here, Table 3 tests whether observed covariates vary discontinuously at the eligibility threshold. The first eight columns test the basic covariates, including gender, race, low income, limited English proficiency and special education status.
With the exception of marginally significant but small differences in the probability of being black or “other” race for the 2005-06 sample, none of those covariates shows a statistically significant discontinuity in either the 2005-06 or the 2005-08 sample. The estimates are precise enough to rule out economically significant discontinuities as well. To test whether these covariates are jointly discontinuous, we generate in columns 9 and 10 predicted math and ELA z-scores by regressing scores from the class of 2004 on the demographic controls listed in the previous eight columns. We then use the resulting regression estimates to predict for the reasons discussed by Angrist (2001). In particular, we are interested in directly interpretable causal effects and not on structural parameters generated by non-linear models.
We also note that estimates generated by probit and logit models turn out to be extremely similar to those generated by the linear probability model above. 14 scores for students in subsequent classes. The estimates in columns 9 and 10 suggest no discontinuity in predicted test scores and the estimates are precise enough to rule out differences around the eligibility threshold of more than 0.02 standard deviations in academic skill. Figure 4 shows graphically the average predicted scores of students in each bin defined by distance from the eligibility threshold, confirming the lack of any clear difference in academic skill between students just above and just below the threshold in the 2005-06 sample.23
No comments