## Friday, December 28, 2007

### NAEP improvement in nineties (limited data)

In attempting to expand NAEP scores beyond just this decade in math and reading, I first looked at black performance over the same time period that had been looked at for state student bodies on the whole and whites exclusively.

For black students, there is an interesting inconsistency with all students and whites on reading tests, where no statistically significant relationship in improvement over time exists (for reading improvement for the '03 and '07 cohorts, the r is .20 with a p-value of .21, compared to .52 for all students and .50 for whites only, both statistically significant). In math, black improvement by state follows the same pattern as it does for the other categories, but it is weaker, at .35.

This comports with an analysis I did on NAEP scores and factors conventionally thought important in ensuring a good education for children. I found that while there are moderate correlations between expenditures on students even after cost-of-living adjustments are made for whites, Hispanics, and Asians, no such relationship exists for blacks. Of course, it might also be seen as refuting the idea that gauging improvement by looking at the difference in the performance of each state's 4th and 8th graders relative to those in other states is of any value.

To look for consistency over a longer period of time, I trekked back to the nineties. Unfortunately, the available data are pretty sporadic. Analyzing the '92, '96, and '00 results together requires using data from only 28 states and DC. Further adding to my frustration, the math and reading results alternate in two year intervals. Math numbers are available for '96 and '00, while the reading numbers cover '94 and '98 ('03, '05, and '07 are available for both, and for all states; I used '03 and '07 for the first post trying to potentially look at teaching effectiveness by state). Science ('00 and '05) and writing ('02 only) results are even more restrictive.

Using math numbers from less than three-fifths of the states from '92 to '00 still leads to correlations of eighth grade improvement relative to fourth grade performance during a given year to the same four years before or after similar to what was found using data from '03 and '07 (.50 from '92 to '96, .70 from '96 to '00), but over an eight year period (meaning that there are no children showing up in both years) from '92 to '00, the correlation drops to .33.

Improvement is thus somewhat related over time. For example, say a state's score puts it at the 45 percentile in 4th grade and 48% in 8th grade, a 3 point improvement. Four years later, the state's 4th grade score puts it at 46% in 4th grade and 47% in 8th grade, a 1 point improvement. Doing this for all the states yields an improvement correlation that is fairly strong, fluctuating from .50 to .70 depending on the years used.

We're just looking at what takes place on the edges, since a state's absolute score relative to another state's doesn't matter, only how the scores for each state change over time as compared to its previous and proceeding performance relative to other states.

I knew that, having figured as much in coming up with state IQ estimates, but I didn't realize just how far out on the edges I was snooping around. A state's actual performance--not what I've called "improvement"--relative to other states is incredibly stable. Looking at actual scores of 4th graders in '03 and 8th graders in '07 (mostly the same kids) by state, measured in terms of standard deviations, we get a correlation of .93 for math and .92 for reading. Doing the same for math from '92 to '96 garners an r-value of .97 (with only 28 states considered). For '96 to '00, it's .96.

I should've done this obvious analysis earlier, to give proper perspective on how minimal these improvement variances are, and why the standard deviations appear so large (because the absolute numbers that underlie them are so stable, small differences seem to be magnified--that's also why in terms of standard deviations, white improvement seems more varied than when all races are considered).

While the varying levels of improvement are fairly consistent going back into the nineties, this improvement (which is only based partly on effective teaching--or ineffective teaching, as suggested by Jason Malloy--and may also be based on children moving, entering public school from private school between 4th and 8th grade, pathogens, parental involvement, the weather, or some other combination of things which includes some amount of noise) is only "explaining" 5%-15% of 8th grade actual performance.

The other 85%-plus is explained by how well the kids perform in 4th grade. Of this 7%-15%, the .50-.70 relationship in improvement over four year periods (and less than that over the eight years from '92 to '00) suggests that somewhere in the range of 25%-50% (arrived at by squaring the r-value) of that 7%-15% is due to consistent differences (ie, not inconsistent noise). If pedagogical strategies comprise half of this, and the other potential causes makeup the other half, adopting the best teaching practices for DC might narrow the performance gap with Connecticut (among the worst in terms of improvement) by 2%-3%.

To make the US smarter, we need smarter children. The intelligence of children is not crafted in the classroom, but in the womb (and probably to a lesser extent during infancy and at the dinner table). Our immigration policies should be informed by this, as should the birthing incentives and disincentives that exist in the IRC.