This tutorial presents the intuitions of the randomness of sample correlation (spurious correlation) and the methodologies in derivations.
Some later sections are somewhat technical as rederived an old equation with more precise functions (in order to apply to fat tails) and showed the distribution of the maximum of d variables with n points per variable.
This paves the way to the real scientific work on random matric theory under fat tails and failure of Marchenko-Pastur.
CorrelationDistribution(10/16)
https://dl.dropboxusercontent.com/u/50282823/CorrelationDistribution.nb.pdf
Tag Archives: big data
The N=1 Fallacy, Nassim Taleb – Seth Roberts Tribute
This video was recorded at UC Berkeley on August 10, 2014 as part of a series of talks to honor Seth Roberts.
HatTip to K Kalidasan
Big Data Caveats, Front and Center – Information Management Blogs Article
Given that backdrop, Taleb’s misgivings on big data and analytics aren’t at all surprising: “We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data.” With big data, researchers have brought cherry-picking to an industrial level … Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information … In other words: Big data may mean more information, but it also means more false information … In observational studies, statistical relationships are examined on the researcher’s computer. In double-blind cohort experiments, however, information is extracted in a way that mimics real life … This is not all bad news though: If such studies cannot be used to confirm, they can be effectively used to debunk — to tell us what’s wrong with a theory, not whether a theory is right.”
via Big Data Caveats, Front and Center – Information Management Blogs Article.
An additional argument against BIG DATA…
An additional argument against BIG DATA, which people are starting to discover just now, is that there are about 35,000 different variables in economics and finance. And we have usually, less, much less than 10,000 data points per variable. This (plus fat tails) explains why econometrics has not been able to deliver.
(cont) To imagine how bad it can get, think that anyone with a good computer program can have access to at least 700 million correlations with just economic and finance variables.
The problem with Big Data…
The problem with Big Data that all these consultants/proponents are not getting: you cannot separate the statistical problem from the researcher’s incentive (convex payoff, like an option). From the many angry responses, it seems that these big data people seem to be years, many years behind…
http://www.wired.com/opinion/2013/02/big-data-means-big-errors-people/