Thursday, December 07, 2006

Swivel to be Youtube of data

[Spending more time with Swivel, I've already become a bigger fan]

Fat Knowledge recently reported on Swivel.com, a website that is billing itself as a Youtube for data. The site is reportedly going to allow internet users across the globe to upload data and accompanying graphs to be viewed and used by that same global community of users. The Swivel preview site is tantalizing me over what it might become, although for now it is pretty clunky and not very useful (I can't figure out how you're supposed to obtain the data in table form used for the graphs that are viewable. Instead, there are just links to the data sources putatively used, which really isn't any more helpful than any other standard search engine).

Hopefully when the full site is launched data tables formatted for statistical tools like SPSS and Excel will be made available. For scrappy amateurs with limited cognitive firepower like myself, finding and entering data sets is restrictively time consuming and exhausting. The stuff I've put out regarding IQ estimates by state consumed tens of hours, mostly in finding numbers and building the regression equations. When afflatus strikes and I think certain variables may be related, I invest an hour or so finding the numbers, entering them, and running them. Not infrequently, no meaningful relationship will exist and that time will have been squandered. By being able to access data sets custom made and formatted for anything anyone who uploads to Swivel can come up with, so much of that tedium will be removed and the energies of curious folks can spend more time finding relationships.

Imagine also how much utility this will provide as genetic sequencing becomes progressively more affordable. I invision the frequencies of different haplotypes by location being made available so that a host of social variables can then be correlated with them. The prison population, the Harvard alumnus pool, political leanings, and how the genetic makeup of these people relate to their genotypes and phenotypes, on and on. The possibilities are endless.

The internet is already smashing myths propagated by the mainstream media and political and academic elites. It is hastily becoming the empiricist's best friend.

7 comments:

JSBolton said...

Congratulations on your valuable work; I wouldn't have guessed that you had that much time in it.
It must be good though, otherwise how could you have anticipated so closely the results of Dr. McDaniel?
In terms of genetic variables some of these may be accessible now.
There is a 1 million DNA unit haplotype block associated with lactose tolerance in Europeans, and specific to that population, with carriers numbering in the hundreds of millions.
The dairy products consumption per person, or sold at retail, is probably available by state. This would be a proxy for the percentage of the state's population with that huge block of alleles inherited clonally.
Someday, I would guess that surprisingly strong correlations will be found with that block, and all sorts of results not usually
considered to have genetic causation of any significance. Politics, the endowment of institutions, crimes of many classes, recruitments into various occupations, such as the military; and many of these showing as being among the highest correlations, if I guess right, based on what I've seen.

crush41 said...

John,

Thanks. I'm glad McDaniel was able to polish and dress it up, and get it out there.

I agree with you regarding the possibilities, and also share your inklings as to what we will find has a genetic component, often in places conventional wisdom would've never thought such a component would've existed.

I looked around for dairy consumption but only found national per capita data. It may be dificult to get a handle on, even with state data, because dairy consumption in general has dropped (I would guess due to concerns about too much fat).

TSM said...

Do you know how to graph your data on Swivel? I took some data from Templer and Arikawa's paper about skin color and mean IQ, and uploaded the data (skin color in the left column, mean IQ of the corresponding country in the right one), to do a simple scatterplot or line graph. It's here http://swivel.com/data_sets/show/1001408

crush41 said...

tsm,

I don't. But I tried uploading some data of my own, and Swivel created three graphs for my data (I didn't indicate I wanted any of them made).

Two of them are useless, but one of them is somewhat informative.

Brian Mulloy said...

Thanks for having patience with Swivel. We sat down at our morning meeting today and covered the issues mentioned in this post and comments. Thanks for airing them out. We're tracking them down.

One note about the philosophy of Swivel. Once you upload data we start crawling it. Our crawlers create all the reasonable permutations of the data and graph it. Then we sort the graphs by how interesting they might be. Many of these graphs will be complete garbage, but our approach is that cruising through 50 garbage graphs in order to find 1 killer graph is worth it. Plus computers are good at this kind of mundane looping through permutations.

Thanks for trying out Swivel. We'll be making updates often. Please, stay tuned.

Brian Mulloy
CEO & Cofounder
www.swivel.com

JSBolton said...

Searching google for "cows' milk production, utilization" turns up a study with state data from 1954, including an adult male consumption column which varies around twofold between the delta states and the Dakotas.

Fat Knowledge said...

Swivel is a great idea, but I agree with you that they need to work a little more on it. Pretty cool that the CEO drops by and comments on blogs though.