Wednesday, March 19, 2014

Which is better, data or theory?

One of the most annoying arguments that I see popping up again and again is the question of "Which is better, theory or data?" (A related bore-fest is "Which is better, induction or deduction?") Actually, you can't have one without the other. In a recent blog post, Paul Krugman points out that you can't have data without theory:
But you can’t be an effective fox just by letting the data speak for itself — because it never does. You use data to inform your analysis, you let it tell you that your pet hypothesis is wrong, but data are never a substitute for hard thinking. If you think the data are speaking for themselves, what you’re really doing is implicit theorizing, which is a really bad idea (because you can’t test your assumptions if you don’t even know what you’re assuming.)
True. Suppose you find a correlation between having an unfulfilling sex life and liking Charlie Kaufman movies. Does that mean that people watch Charlie Kaufman movies to ease the pain of their lame sex lives? Or did the fact that they watch Charlie Kaufman movies actually ruin their sex lives? Or are both the result of some third factor, such as being an insufferable hipster? If you don't pick one, you'll never be able to understand what's really going on. Even if all you care about is predictive power - you want to be able to catch someone watching a Charlie Kaufman movie and say "I bet girls won't touch that guy with a 10 foot pole" - you still need to assume that the correlation is stable over time, and your assumption is a theory.

It's equally true that you can't actually have theory without data. A theory is always about something that you think is going on in the world, so you can't have something to theorize about without first seeing something happen in the world (i.e. data). For example, suppose my theory - which I deduced from some sort of a priori assumptions - is that watching Charlie Kaufman movies ruins one's sex life. I couldn't have made that theory without observing the existence of Charlie Kaufman movies.

So just as "data with no theory" is really just an implicit vague theory, "theory with no data" is really just sparse, unsystematic data. You can't have one without the other.

But what you can do is be lazy with theory or be lazy with data. You can be an armchair philosopher, dreaming up ideas about how the world works without ever bothering to find out if your ideas are right. Then you get something like this:

Or you can be a "regression monkey", sitting there sifting for correlations without having any idea what you're looking at. Then you get something like this:

Obviously, if you're going to get good results, you shouldn't do either of these.

But which is a bigger menace to society, laziness about data or laziness about theory? Theory-laziness is seductive because it's easy - mining for correlations isn't very mentally taxing. But data-laziness is seductive because it's hard - the more complicated and intricate a theory you make, the smarter it makes you feel, even if the theory sucks.

In the past, data-laziness was probably more of a threat to humanity. Since systematic data was scarce, people had a tendency to sit around and daydream about how stuff might work. But now that Big Data is getting bigger and computing power is cheap, theory-laziness seems to be becoming more of a menace. The lure of Big Data is that we can get all our ideas from mining for patterns, but A) we get a lot of false patterns that way, and B) the patterns insidiously and subtly suggest interpretations for themselves, and those interpretations are often wrong.

So anyway, I hope this post destroys all of those "data vs. theory" arguments forever and ever.

In the arts and humanities, data laziness is still very common, because any attempt to quantify these outputs is considered gauche, insensitive, or ham-fisted (adjectives often found in the first few paragraphs of lit theory papers).

1. New type of spam bot?

2. Anonymous9:20 AM

Pretty good description of QM for a crazy guy. Sure the opening paragraph is unintelligible, but he clearly knows more about the physics than the writers at, say, New Scientist.

8/10, looking forward to your next post.

2. It would be nice if you sourced some of your excellently chosen illustrations.

I found the mountain range at:

I was looking for some explanation of the diagram, but now I suspect that they just drew it with data only at the start and end points. Otherwise the coincidence would be really startling.

1. Anonymous9:18 AM

The data might be real. It's the mountains that are cut-and-paste, not like any real montains that I've ever seen.

3. The problem is right-wing ideology of the Finance Macro Canon variety. They won't alter their theories in the face of evidence that they are wrong. Then they mangle their theories to argue things like ""All of these guys are relatively orthodox quantity theory guys--they all expect a tripling of the monetary base to cause 200% inflation. And here they are, all saying that what you need to halt that 200% inflation is for the Fed to offer to pay 0.25%/year on reserves."

"Josh Bivens sends us a link to a House of Representatives hearing he participated in, with a striking discussion that takes place at the 45:00 mark..."

1. The problem is that the evidence is usually unconvincing. The macro-economy is so causally complex that it's almost impossible to isolate variables.

4. It's a tradeoff that really depends on the quality of the data. When there is plenty of high-quality data, data is more important. When data quality is poor, you get more theory. This is why there is tons of (respectable) theory in economics and ecology, and much less in neuroscience (where new data is constantly pouring in).

5. I don't get how data can't exist without a theory.

How about Galileo? He showed that balls roll down with constant acceleration, there was no theory involved.

Even when you describe the motion with a "theory": F=ma and F~const this is not the result of "hard thinking" in the sense that you know this has to work, it is just a mathematical representation that happens to give correct results in some range of inputs, but nobody knows WHY this representation works. Any subsequent generalizations of Newtonian mechanics that reduce in the limit to F=ma are also mathematical representations that explain data, that is all. Nodoby will ever know WHY they work, and not some other representations, they must be chosen based on data. So some basic statements must remain axioms, and if the theory is to work, they must me chosen by data, no amount of "hard thinking" will replace it.

1. "How about Galileo? He showed that balls roll down with constant acceleration, there was no theory involved."

Theory:

There is something called acceleration.

That something can be measured by rolling balls down a plane.

That something is stable over time, so that measures of it do not change their meaning from moment to moment.

Galileo had a couple of thousand years of theorizing about force and movement behind him before he performed this experiment, without which he never would have devised it.

2. "There is something called acceleration."

Acceleration is a definition. Objects can have it or not. You can measure it before you have any opinion whether the balls will have it or not.

He just checked how the balls moved, pure data, no theory.

3. Anonymous4:57 PM

I would say that you can certainly collect data/observe without theory; but if you were to present that data, an unformed theory would be implied. In Galileo's case, the implied theory might be: "Something (don't know what) is making these balls roll down hills."

So, you can collect data without theory, but that data is useless if you can't ascribe any meaning to it (explain it via theory).

4. "Acceleration is a definition."

Yup: a *theoretical* definition.

"He just checked how the balls moved, pure data, no theory."

No, that is complete nonsense. For instance, what in the world would direct him to make THOSE particular measurements if he had no theory: there are a quadrillion things he could have measured! In fact, you need a theory of measurement even to be making measurements!

5. Galileo, in fact, wrote an entire book discussing the theory behind those experiments: http://en.wikipedia.org/wiki/Two_New_Sciences

6. Constant acceleration is the theory.

Galileo is an example of how to do it right. He looked at the data and then made a theory to explain it, and the theory held up well in other contexts.

Now he could have come up with the theory first and the result would have been just as good. that's why the argument over "deduction" vs. "induction" is silly.

7. "Acceleration is a definition."

Yup: a *theoretical* definition.

Lol, are there any data-definitions? Definition is just a name for an object or property. It cannot be theoretical or not. Calling an object or property "chair" or "speed" is not theoretical, it is a name. If it is a name of a property objects can have this property or not.

Galileo had no theory as to whether the acceleration would be constant before making the experiments, he collected all the data with no theory needed. He was just looking how objects behave.

"Galileo is an example of how to do it right. He looked at the data and then made a theory to explain it, "

Exactly. First data, then theory. He didn't need the theory to collect data and the data existed efore the theory existed. Data doesn't need theory to exist or be collected. But of course if you have a theory it tells you where to look and for what, but it is a different story.

Similarly with Linne: he collected reams of data about classifications of organisms but then Darwin could use it much later to support his theory, or even to formulate it. Similarly Darwin had no theory of gene-based heredity, he just assumed something had to happen that organisms inherit features in a binary way from ancstors, he had no clue why and he didn't start from a "theory" that would tell him this must be the case.

8. The concept of chairs isn't totally pretheoretical. It relies on the idea that our senses represent a world of three dimensional objects, and furthermore that this world contains people who sometimes sit on things.

In order to interpret data, you need to have some sort of framework for interpreting the world: a theory. (Even logic itself is a theory.) But then once you get in this data, you need to be flexible and create new theories which better suit the data.

Darwin didn't start out with theory that told him evolution must be true, but he did start out with a theory that animals exist, which was quite necessary for forming new theories about them.

9. ""Galileo is an example of how to do it right. He looked at the data and then made a theory to explain it..."

No, this is historically wrong. He DEVISED the experiment in order to demonstrate his already developed theory of force. There is simply no way to devise a sensible experiment in the absence of a theory.

10. PeterP has yet to explain why Galileo was measuring balls rolling down a ramp if he had no theory. Why was he not measuring the length of noses, or the time it took to ride a donkey to Siena, or the temperature of ponds, or indeed, why not measure a mix of all those things? Or, and "acceleration" is a theoretical term because it only makes sense in the context of a theory of force. "LOL" is not a philosophical argument, you know, and the story of science you got from a 6th-grade textbook is no substitute for actually knowing about what went on.

6. data that confirms the theory is better than a theory without confirmatory data or data without an underlying theory. The mountain range example you give doesn't work because the mountain is always static. The next step is trying to find if a causal factor could exist. With mathematical based scientific theories this is done with a deductive reasoning and axioms. With social sciences this is not possible most of the time.

7. Phil Koop12:43 PM

Wait ... that "Theory" quotation is an image ... suggesting that you scanned it from paper ... suggesting that you actually possess the source, on your bookshelf. Oh dear.

As Inspector Morse once put it: "*Arturo* Toscanini? I wouldn't have it in the house!"

1. Agreed. I googled the quote and it belongs to Jacques Derrida. That people take him, and that style of philosophy, seriously can make me lose hope at times for humanity. As much as the regression monkeys are bound to be wrong, at least they'll be coherent.

2. Anonymous2:56 PM

I can see where to the common reader that quotation might appear to be gobbledeygook, but it's actually very precise jargon serving as convenient shorthand allowing highly sophisticated and educated parties to efficiently communicate very precise ideas. You wouldn't understand.

3. Anonymous3:00 PM

So it's better to be coherent and wrong than hard-to-read with something to say. Got it. The writing is dense because the problems he's trying to talk about are with language itself. I suppose you can dismiss decades of research and thought in the humanities without batting an eye like that, but that just puts you squarely into the realm of the econo-engineer-esque "this narrow interpretive frame that I refuse to acknowledge is interpretive is actual reality". Numbers over all, correct or not, right? After all, undergraduate level differential equations must be the underlying description of human behavior. Maybe you can get fancy and throw in a random walk or a stochastic process or two so you can pull in physics literature that barely applies as well. And shit, if you implicitly assume all kinds of things about equilibria and information sharing, you even get to pretend that game theory is a reasonable tool for modeling reality.

4. Anonymous4:20 PM

I reckon it is pretty cool to see a Derrida quote in a macro-blog.

5. "So it's better to be coherent and wrong than hard-to-read with something to say. Got it. The writing is dense because the problems he's trying to talk about are with language itself."

Derrida isn't "hard-to-read" here, he's literally nonsensical. There is no content in that quote at all. It's indistinguishable from Alan Sokal's rightfully famous trolling of similar work.

6. Anonymous8:11 PM

The point being made is that you're not qualified to judge its sense or lack thereof. While shared delusions are not entirely unknown, the fact that those who have expended to effort to understand what is being said disagree with you en mass is a red flag.

As for me, I don't claim to be able to adjudicate from my own knowledge. However, the fact that Daniel Davies (D^2), who is more than comfortable with all the math used in economics, and who is rarely wrong, disagrees with you, means to me that you're wrong.

7. Anonymous12:20 PM

Ah, appeal to authority! You lust for smart people but don't understand the weakness of your own arguments.

"Davies is smart, he must be right about *everything*!"

Are you suggesting it is *impossible* to express the content of that quote in a more easily understandable form without losing conciseness? I'd like to see that proof!

The hive mind strikes all types of groups. Mass delusions are often formed around consumers, whether the object at issue is a movie, a TV show, an author, or a theorist. Some people go in having already made up their mind whether they are going to like or hate the content because of what it will *say about themselves to the group* when they express a commonly held opinion. Some people seem to have no idea they are doing this. They think they are being objective about their undying love or undying hatred of, say, Game of Thrones. But their own arguments for why they love something have obvious internal inconsistencies, which when pointed out, makes them go apoplectic.

Obviously, we live in a world where extremely intelligent people sometimes have diametrically opposed views. You can find this sort of opposition everywhere. Where does that leave us? The answer cannot be that intelligence is the "deciding factor" in why one person is right and the other wrong. The answer must have something to do with one (or both) being subject to cognitive biases.

There has even been a suggestion that the intelligent are *more* susceptible to bias because their superior intellectual machinery can be put to use protecting their predispositions.

Without commenting on the content on the quote itself, I'm highly confident that there are many who just instinctively defend it because of the brand (the author) associated with it. Also, it makes them feel smart. And it's always nice to have that going for you.

But thanks for randomly accusing people you don't know of not being qualified to judge something. It alerted me right quick that you had no idea what you were talking about, and at the same time proved great fodder to go deeper into discussing the contours of this issue.

8. Anonymous3:25 PM

New poster, here. I studied Derrida as an undergraduate and I understand his philosophy pretty well (to the degree that it can be understood by those not willing to surrender their minds wholly to his worldview). It is fundamentally focused on demolishing the ideas of prior philosophers and providing ammunition for ad-hominim attacks against anyone who disagrees with his conclusions (by "deconstructing" the individual doing the questioning and people or texts that they cite, rather than addressing their arguments). And, as this blog posting notes, Derrida's work is devoid of ideas that are falsifiable (i.e., that can be tested). So, his ideas are really philosophical musings more than scientific theories. This is true of many theories in the humanities, but few other theories manage to be as opaque or employ so much circular reasoning. The only reason that I can think of for him to have such stature in academia is that his theories provide ammunition for attacking any professors who don't share his worldview (ho dare to provide a classical liberal narrative that is thoughtcrime to marxists such as Derrida). The intolerance of his followers for free thought is why I did not remain in the university beyond the time required to attain my undergraduate degree, and why he is not someone who I spend time studying now that I make my own syllabus for further learning. Random anecdote: my professor who worshipped Derrida once commented that, when in graduate school, she used to eat at restaurants until she ran out of money for the month, and then she would steal food from grocery stores... and this is the person who was supposedly improving my ethical awareness of the world.

9. "my professor who worshipped Derrida once commented that, when in graduate school, she used to eat at restaurants until she ran out of money for the month, and then she would steal food from grocery stores... and this is the person who was supposedly improving my ethical awareness of the world."

Sounds a little like an ad hominem ;-)

8. Phil Koop12:50 PM

More seriously, much of the development of time series analysis (or "econometrics", if you prefer) can be read as an ignoble attempt to avoid grappling with the need for causal assumptions: an explicit causal model.

As Judea Pearl put it (in paraphrase), just as logicians, including quite prominent ones like Boole and de Morgan, wasted half a century in the forlorn attempt to prove the theorems of first-order logic using the machinery of propositional logic, statisticians have wasted half a century trying to draw causal conclusions without making use of causal calculus.

1. More seriously, much of the development of time series analysis (or "econometrics", if you prefer) can be read as an ignoble attempt to avoid grappling with the need for causal assumptions: an explicit causal model.

Most econometrics is not time-series.

And most of the econometrics I see involves either instrumental variables or structural estimation, both of which are methods that address causality.

So I'm not sure what you're talking about. SVARs?

9. Anonymous2:56 PM

My favorite term for theory without data is "mathturbation." It might be fun, but it doesn't actually accomplish anything.

1. Anonymous4:25 PM

The mathturbation of macro. That might stick.

2. Hey, when string theorists mathturbated, real new math came out.

10. Anonymous3:20 PM

I'm really lazy and not an economist, so I read economists who are not lazy and write well enough for me to understand. Keynes, Smith, Galbraith, Krugman, among several others. I then check to see if what they explain actually happens the way they explain it. And then I adopt their theories if they do. I hope that's ok.

11. I long had difficulty in my major, philosophy, because it was so much theory and so little connection with data. And of course the 'theory' was nothing like the strict and scientific definition of theory.

Even in the 1960s it was obvious to me that such facts and data suggested by evolution, and to a lesser extent physics, offered clues for the big questions of philosophy. A metaphor I came up with is that philosophy without cognitive science is like medicine before the discovery of germs, viruses, and DNA.

Knowing the intellectual forebears in our various fields is very much part of history, and worthy of study in itself. But figuring out what we mean by will, freedom, causation, etc does not require a lifetime of studying and restudying Plato, Aristotle, Kant, and et cetera.

1. "Even in the 1960s it was obvious to me that such facts and data suggested by evolution, and to a lesser extent physics, offered clues for the big questions of philosophy."

No wonder you were bad at philosophy!

12. Big Data is about multivariate interactions and predictions how is it that every ******** time arguments against it present in-sample correlations between two variables?
The crucial points that economists don't get are; A) the theoretical part of machine learning is in the way you exctract patterns and not on which is the secific pattern you want too look for. The parametrization (or lack off) of your data-driven algorithms is the theory; e.g. if you look at correlations you are assuming that in your system that concept has the meaning of dependence. B) It is not a reassuring story (or equilibrim model) which will tell you whether one of these correlations is spurious or real, it is the possibility to use these information for predictions on new samples.

1. Ok, but then you have some underlying theory of how to look at data. And that theory better be good or you may think that you are training your neural network to discern camouflaged tanks hiding in the forrest from images of your recon planes only to find out that you actually trained your neural network to discern between cloudy and sunny dais in photos because that just happened to correlate with your sample of photos of tanks and trees respectively.

So in short - yes if the day comes where it will be easier to program an AI that will do economic analysis than to actually do economic analysis then you will be right. But then we will just shift in our view. The best economist will be the programmer who will be the best at programming economic program. To use another example - the best chess player is not Gary Kasparov, but the author of the best chess algorithm. That guy may not be particularly good at chess - he may be beaten by average chess enthusiast - but he is best at teaching computer how to play chess. And there is a lot of theory in that too.

13. The problem is not too much theory or too much data. The problem is that, increasingly, the only empirical evidence that seems admissible is numerical and subject to statistical analysis. Hence, detailed historical accounts, case studies, interviews, field studies, etc. are largely ignored by so-called theorists. Similarly, instead of treating models as narrow cases of what Richard Nelson (1998) calls appreciative theorizing, and using them mainly to check for logical consistency, we have turned them into the only admissible method of theorizing, by assuming away any complexities that are too hard to model quantitatively.

1. This is a very good and important point.

2. Thanks. I wonder if the culprit behind this is Friedman, with his essays in positive economics.

3. Anonymous11:04 AM

Exactly. Krugman says that data is not enough "you need to look through the lens of a model".

That is not enough either. In fact doing that could be even worse.

You need context.

Then you can start to understand what the data means.

4. Anonymous, yes, we agree.

14. Anonymous6:23 PM

This is all wrong. You can well have theory without data, valid deductive propositions are logically necessary without having to be true. You simply end up with something whose truth content you completely ignore -may be high, may be low. Moreover, you can also have data without theory, because otherwise you're forced into the typical Popperian impasse -well expressed by his postmodern scholars- that the observational propositions needed to test hypotheses are theoretical temselves, thereby leading to an infinite regress with no hope of falsifying even the tinyest piece of theory. For the sake of common sense, therefore, the possibility of induction of observational propositions from perception of simple objects must be allowed -which is what can be plausibily called "data without theory". Otherwise, willy nilly, you end up being a phenomenist -which is also self-contradictory on metaphysical grounds from a traditional Aristotelean-Russellian-realist point of view.

And no, predictive power is no substitute, nor a proxy, for truth. Predictive power may fail to proxy for the truth content of a theory in so vast an amount of ways which is just ludicrous to contemplate an opposite position. Friedman, together with all those inspired by Weber on this point, was completely wrong. Instrumentalism, " 'as if' stories" explanations as Jon Elster calls them, is literally irrelevant for scientific knowledge of the word.

15. Correlations in data (big or not) are 'markers' for underlying causal relationships...that is where the INFORMATION that can bring understanding can be found. If all we do is seek patterns in data according to what we think we know, we ignore what Shannon told us, that Information and uncertainty are inversely related. 'Mathturbation' may be pleasurable and rewarding for practitioners &/or Academics but damaging for the rest. WE know that risk is not the threat and predictions based upon correlations or flawed theories with (risk) management based upon concatenated probabilities plus assumptions ADDS to the reality of uncertainty. FACT is we cannot predict and theories or models that seek data to validate them fail to seek the information of new, emergent, patterns so cannot provide a quantitive basis for preventative or resilient strategies. The future is constantly under construction with events that are not just probable but possible and plausible so we content ourselves with conventional 'wisdom'.

16. Anonymous8:23 PM

And you didn't even mention the issue of the theory-ladenness of terms. Congratulations on your restraint.

6:23 Deductive propositions that don't connect to data somewhere are not usually called theories.

The status of models that don't claim to describe the real economy, but are nonetheless taken to somehow provide insight or illumination about it, is a principle subject of Philosophy of Economics. POE is now a separate subfield of Philosophy of Science, even having its own journal. Is POE itself theory? What kind of theory? These questions are left as exercises for the student.

17. Applied statisticians succumb to this temptation far less often than outsiders who've merely dabbled in data analysis. Most of us see ourselves as helping subject matter experts design research studies to effectively bring data to bear on their research questions, and then analyzing the resulting data to this end. The entire enterprise is thoroughly collaborative--we defer to the subject matter experts as far as possible to make sure that the study we're planning will be relevant to the question at hand, and they defer to us to make sure that the resulting data yields as much useful information about their question as possible within the constraints they face.

The problem seems to stem from people without much training in statistics learning how to use computers to perform various kinds of statistical analyses. We often have clients come to us and say things like, "is there a list you can give me that tells me which test to perform for which kinds of data?" (Even worse, "which button do I click for repeated measurements?") The answer, of course, is no (and not just because hypothesis testing is only one very limited way of analyzing data). Which analyses make sense to perform, and how to interpret the results, depends both on subtle aspects of statistical models, as well as the theoretical and experimental context yielding the data. You can't expect to come to a sensible answer without having someone in the room with a great deal of expertise on the relevant science, and someone else in the room with a great deal of training in statistics.

Nate Silver is a smart guy, and I think a lot of people in our field appreciate the positive publicity he has brought to statistics, but I wonder how much of the recent criticisms he has been facing could have been avoided with more training in statistics. For one thing, he might have stocked his new outfit with more learned subject matter people, instead of thinking his extensive knowledge of data mining techniques alone prepares him to say useful things about anything and everything.

18. Anonymous6:00 AM

Kristof writes

"Universities have retreated from area studies, so we have specialists in international theory who know little that is practical about the world. After the Arab Spring, a study by the Stimson Center looked back at whether various sectors had foreseen the possibility of upheavals. It found that scholars were among the most oblivious — partly because they relied upon quantitative models or theoretical constructs that had been useless in predicting unrest."

I think that is true in economics too. People seem to look at Japan and China from the perspective of theoretical models, with very little contextual knowledge about how macroeconomic policy in these countries actually work. Both theory and data in this case is not enough. You need context. That comes from specific country knowledge which includes, among other things, linguistic and societal knowledge. OMOs in short term government debt instruments that target the interest rate or bank's central bank deposits? Believe me, there is a lot more to it than that.

19. Anonymous8:48 AM

Hence the power and value of a control to find causality.

20. Anonymous9:42 AM

Interesting to read Krugman's latest piece on Goodfriend and inflation prediction. Goodfriend wrote the hubristic Journal of Economic Perspectives 2007 article "How the World Achieved Consensus on Monetary Policy". Thankfully the price stability priority/ DSGE-based consensus did not extend to the policy community and the measures taken the very next year.

21. So right and so wrong at the same time.

What is the objective function? Like all things, it depends. If your objective function is to advocate a policy and raise money from donors, I doubt you will let a little thing like only having 5 data points get in your way. It will be 30 years before you are conclusively proven wrong, and even then, lots will have happened that will allow to you explain away the answer. If your objective is to write a whole bunch of papers and get tenure, I don't see being wrong entering into the objective function anywhere.

Now, which FOMC member, Fed economist, private sector economist gets paid based on the squared distance between their forecast and reality? If I want to know who has a better forecast, maybe I should ask them.

Overall, I do not see a bunch of bad or lazy economists. I see a bunch of economists who are fulfilling the incentives handed to them.

22. In the aggregate, theory-laziness is probably a bigger problem. But within certain subgroups (ahem, libertarianism) data-laziness is endemic.

23. Well as you just pointed out in a recent post Keynes answers this very question, Keynes answer: the data are the slaves of our theories.

Perhaps if economies slow (initially) when debt levels are high it is only because we believe that the right response is to reduce debt when it is high which behavior fulfills the theories prediction, and a different theory - say that the best cure for a sluggish economy with high debt is higher debt and the opposite prediction would be reflected in the data.

24. Best post you've written in years.

Seriously, you could make this one into a book, one of those pop culture hits like Taleb's "Black Swan" book. Do it, make a million bucks. Please!