Wednesday, September 12, 2007

If measurement could be more precise, the perceived relationship is stronger, not weaker

A favorite Sailer post consists of an amalgamation of reader suggestions on ideas that journalists (and everybody else) should keep in mind.

I have another to add, that I've seen ignored on several occasions. As someone fairly well-versed in statistics, it's obvious. But as someone who frequently struggles to grasp so many things that more informed people find effortlessly intuitive, I'm confident some will find it being explicitly spelled out quite helpful.

In the comments of an Arnold Kling post on the intelligence-livability relationship in the US, a skeptical reader writes:
What I find odd is the acceptance of the state as a boundary -- I've lived primarily in two very heterogenous states, Texas and Washington, and they are not fairly characterized by a sum over the entirety. But that is fairly typical.
He is insinuating that the correlation between IQ and livability is questionable because I've used data at the state level to run a regression. But the less precise the use of state-level data is, the stronger the true relationship between intelligence and livability actually is. The more noise extant in the measurements, the lower the observed correlations are going to be. As the sample size increases, the likelihood of it being otherwise rapidly approaches zero (an n of 50 is more than sufficient for this). In attempting to undercut the observed relationship, he is actually suggesting that it is more vigorous than I relate.

If I suspected a relationship between the percentage of land paved with asphalt and population density in the US, my suspicion would be confirmed by relating the two at the state level. Say I find the two correlate at .40 (I'm assuming that this supposed relationship isn't contested even though I'm not using actual figures). New York has a higher proportion of its land devoted to roads and a greater population density than Alaska does.

But Anchorage has more asphalt and neighbors than does the area 50 miles north of Utica. The effect of internal 'inconsistencies' like these are lessened by a similar analysis on the county level. Most US counties have few people and few roads. A relative few have lots of people and lots of roads. The correlation jumps to .80.

Arguing that, because the state-level comparison isn't perfect, there is probably little relationship between population density and land devoted to roads is to get it backwards. Because the state-level comparison isn't ideal, a more precise comparison will show that the real relationship is considerably stronger.

That same sort of argument is constantly employed against the utility of IQ data. Because IQ tests are not perfect measurements and only approximate overall intelligence, that IQ and infant mortality correlate strongly does not mean that thus-far unmeasurable Intelligence and infant mortality are related in any way. Of course, if IQ, as a sloppy measure of intelligence, is related to infant mortality, a crisper measure of intelligence will be, outside of extreme (randomly occuring) anomaly, be more strongly related to infant mortality than imperfect IQ.

Or consider my state IQ estimates that were followed by VCU Professor Michael McDaniel's estimates. They were arrived at similarly, but McDaniel's numbers were more precise. And, of course, they correlated more strongly with other good-faith IQ estimates than mine did.

Essentially, as in life generally, in measuring statistical relationships, the perfect is not the enemy of the good.

1 comment:

Steve Sailer said...

Right. You see that all the time with criticism of Lynn and Vanhanen's national average IQ numbers. There are perfectly valid criticisms that can be made of their source data, such as how nationally representative it is and the apples and oranges aspects of comparing results from different tests, but many people take the exactly wrong implication from these quibbles: therefore we shouldn't pay attention to the correlations between national average IQ and real world results like GDP per capita!

Uh, no, it's really the other way around. Unless you can identify non-random biases pointing in a single direction, all the noise in the data means that true correlations are probably higher, not lower.