And speaking of genetics, why haven’t we found much of significance in the dozen or so years since we’ve decoded the human genome?
Well, if I generate by simulation a set of 200 variables — completely random and totally unrelated to each other — with about 1,000 data points for each, then it would be near impossible not to find in it a certain number of “significant” correlations of sorts. But these correlations would be entirely spurious. And while there are techniques to control the cherry-picking such as the Bonferroni adjustment, they don’t catch the culprits — much as regulation didn’t stop insiders from gaming the system. You can’t really police researchers, particularly when they are free agents toying with the large data available on the web.
I am not saying here that there is no information in big data. There is plenty of information. The problem — the central issue — is that the needle comes in an increasingly larger haystack.
via Beware the Big Errors of ‘Big Data’ | Wired Opinion | Wired.com.