Noahpinion: Why I like Frequentism

Thursday, July 24, 2014

Why I like Frequentism

My post about "Bayesian Superman" wasn't actually intended to be a knock on Bayesianism - it was just about a quirk of rationality. I certainly don't think Bayesianism is a "dangerous religion that harms science"!

But reading this Andrew Gelman essay made me think about Bayesian inference in science, which then got me to thinking about Frequentist inference, and why I think Frequentism is a bit underrated these days.

Frequentist hypothesis testing has come under sustained and vigorous attack in recent years. It's arbitrary, it doesn't obey the likelihood principle, it throws away information, it can lead to silly results. All this is true. But there are a couple of good things about Frequentist hypothesis testing that I haven't seen many people discuss. Both of these have to do not with the formal method itself, but with social conventions associated with the practice. These are:

1. The unwritten rule that you "don't protect the null hypothesis" (or that you "penalize type I errors relative to type II errors"), and

2. The implicit three-valued logic of "hypothesis rejection".

The first of these is about what kind of prior (to use Bayesian language) the scientists should start with. It basically says that you should bias your conclusions against your own hypothesis. This is in contrast to, say, using flat or "uninformative" priors.

The second social convention is more amorphous and difficult to define. It's about what conclusions you draw from the hypothesis test. If you don't protect the null, and you reject the null in favor of your own hypothesis, Frequentism says you've found something interesting, and you should follow up on it. If you fail to reject the null, though, it doesn't mean you believe in the null any more than you did before - it means you shrug and move on. Frequentism implicitly assigns results to one of three categories: 1. "interesting evidence against a hypothesis", "interesting evidence for a hypothesis", and "nothing interesting". It's sort of like three-valued logic, in a way. Compare this to the implicit logic of the likelihood principle, in which you compare alternative hypotheses directly.

Why do I like these social conventions? Two reasons. First, I think they cut down a lot on scientific noise. "Statistical significance" is sort of a first-pass filter that tells you which results are interesting and which ones aren't. Without that automated filter, the entire job of distinguishing interesting results from uninteresting ones falls to the reviewers of a paper, who have to read through the paper much more carefully than if they can just scan for those little asterisks of "significance". Naturally, this filter also has a downside - it creates publication bias against "negative results" that may in fact be interesting. But that may be a small price to pay to avoid the flood of paper submissions that would result if everyone just wrote up and sent in the results of any estimation exercise.

Second, the discipline of the Frequentist social conventions acts against scientists' natural tendency to favor and promote and believe in their own theories. It tries to enforce the idea of scientific honesty. Feynman talks about this in his famous speech, "Cargo Cult Science":

It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked--to make sure the other fellow can tell they have been eliminated.

Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can--if you know anything at all wrong, or possibly wrong--to explain it.

The kind of integrity Feynman is talking about concerns systematic error. The Frequentist social conventions are an attempt to do something similar for random error. This provides a natural defense against "scientific trolling", which is a term I just invented to mean "the tendency of unscrupulous researchers to report weak results to advance some ulterior agenda." Scientific trolling means that ulterior-motivated researchers will submit a flood of weak results, while scrupulous, scientifically-motivated researchers will voluntarily restrain themselves from reporting equally weak results that go in the opposite direction. That sort of reporting bias will tend to contaminate the beliefs of a neutral observer. (I can think of at least one very good real-world example of this, but I will be polite and not discuss it on a blog.)

Now, of course, the Frequentist social conventions are weak, inadequate defenses against subjectivism and noise. They have drawbacks, like discouraging the reporting of negative results. And they are subject to being gamed by unscrupulous researchers. But at least they are something.

Bayesian inference seems to me like a perfectly fine and good method of inference. It's more appealing in many ways than classical Frequentism. But I think that Bayesianism might want to get some standardized social conventions similar to (and hopefully superior to) the Frequentist ideas of "not protecting the null" and "statistical significance" (note: it may already have these, and I'm just not aware of them). These conventions would unavoidably be arbitrary, and would throw away some information in many cases. But they would help lean against the natural incentives of the scientific reporting and publishing process. Maybe there could be more than one set of conventions, for use in recognizably different situations.

Classical Frequentist hypothesis testing is probably on its way out in the long term. But the fact that it has survived and dominated scientific publishing as long as it has, in spite of all its well-known problems, might be a testament to the usefulness of the unspoken social conventions associated with it.

33 comments:

ZHD6:49 AM
This is definitely a good post to get people thinking. I just hope your average readership isn't too entrenched in any one particular camp.

Gelman has written a couple very inspiring pieces (with total humility) about his interpretation of Bayesianism and how it fits into his personal philosophy. In Gelman's pieces, his appreciation for philosophy of science is actually what turned me on to Deborah Mayo's (and Aris Spanos') 'Error Statistics' philosophy which is frequentist-based.

I have a feeling I can predict half of the next ten comments that will appear here. And it's not going to be pretty.
ReplyDelete
Replies
me7:24 AM
Oh, I very much agree with most of what you say, Noah. Where I disagree is on your prediction about the future of frequentism. If it is on the way out, it is on a very, very slow boat. Particularly in the biomedical sciences, which is my own field where the depth of the overall statistical illiteracy would shock a lot of people. They tend to feel themselves so overwhelmed by their biological research problems, they don't have the patience to "learn" proper frequentist statistics, much less bayesian statistics. Yet frequentist tests are embedded in some popular, easy to use graphing software packages we use. And though everybody overuses t tests and mangles the P value, it is easily understood as a simple threshold, much like an expiration date on a food package.
ReplyDelete
Replies
Darf Ferrara9:53 AM
Ronald Fisher (pictured) was not a bayesian, but he wasn't a frequentist either. He followed what he called the likelihood principle (which seems to me like bayes without acknowledging the prior). I just wanted to let you know so that Robert Murphy doesn't bite your head off for using the wrong picture again.
ReplyDelete
Replies
Christopher9:58 AM
A couple of points about Bayesianism, if I may. One scholar in the field emailed me this morning reminding me that a datum does not strengthen a hypothesis of the datum in question is equally consistent with a contrary hypothesis. In the Superman case, a contrary hypothesis might be, "God, who meant to wipe me out for my impudence years ago, is a forgetful Being." Every day of survival supports H2 as fully as it supports H1 so, on Bayesian grounds it supports neither.

Another point concerns that Andrew Gelman essay to which you've linked us. For those who haven't followed the link: it is an article that appeared in a scholarly journal called BAYESIAN ANALYSIS in 2008. Gelman considers himself a Bayesian, as the article makes clear, so he doesn't consider the objections he outlines there decisive. The article describes itself as "A Bayesian's attempt to see the other side." I think those who read the quotes you've provided should also ask themselves why those same considerations didn't cause Gelman himself to identify with the "other side," why they left him Bayesian.

Having said that, I'll also pass along a link to something I wrote, posted on an alternative-investment blog yesterday. http://allaboutalpha.com/blog/2014/07/23/a-challenge-to-bayesian-probability/
ReplyDelete
Replies
Student10:34 AM
I disagree on a couple of levels.
1.) I dont see that the convention of arbitrarily specifying alpha upfront (1.) provides any benefit to science.
2.) It is not the case that there are no standards for hypothesis testing in bayesian inference.
I know you were specifically avoiding the issue of how the two schools interpret probability but that difference is inseparable from how the social conventions have come to be in the first place. IMHO, there is really no reason anymore to rely on the frequentist approach at all and it provides no benefit to science.

Look, what you are ultimately trying to do with hypothesis testing is to estimate which hypothesis is more probable, given the data right?
Technically, frequentist hypothesis tests do test hypotheses, given the data. Rather, they test the data, given the hypothesis. Bayesians, on the other hand, test the hypothesis, given the data. That is simply more appropriate and provides many advantages, such as allow Bayesians to compare model fit even when the various models rely on different methods (how is that not beneficial to science given the results are most often based on different approaches).
The differences between the two schools lead to results that can be virtually identical to quite different. Since Bayesian inference is coherent even in a frequentist sense while frequentist inference in incoherent in a Bayesian sense, a Bayesian approach is always preferable. Again, there is no good reason to be a frequentist anymore. Due to advances in MCMC methods and computers, it is possible now to estimate solutions to even highly complex problems that do not fit the frequentist interpration of probability but that fit in the Bayesian one.
ReplyDelete
Replies
Anonymous12:48 PM
Why I don't like frequentism: in almost all scientific applications, frequentist methods are harder to do correctly than Bayesian analyses. In most cases, this leads to systematically incorrect results. An extremely simple example is fitting a straight line to data where the spread (noise) in the x-values is comparable to the width of the real linear signal. In a frequentist approach you cannot get the right answer without marginalizing over the noise distribution of the x-values. In fact you won't even be close; the slope will be wrong by 20% or more unless you do an extremely difficult integral over the distribution that the x-values are drawn from. And this integral is extremely difficult even if you assume only a simple gaussian prior. If you throw this problem into a bayesian analysis, it is incredibly simple to marginalize over any arbitrary prior, and get the right answer for the slope. I don't like frequentism because most people do it wrong, and doing it right is harder than doing a bayesian analysis.
ReplyDelete
Replies
Student12:57 PM
first, above I meant "technically, frequentist hypothesis tests do NOT test hypotheses, ..."

I would say the most widely used are bayes factors, the deviation info criteria, probability intervals based on the marginal posterior. However, you could conduct hypothesis testing just like frequentists if you wanted by evaluating the posterior a certain way. The only difference would be that frequentists treat the probabilities as fixed long run constants while bayesians treat the probabilities as parameters conditional on the data. That aside, you could use non-informative priors compute psuedo p-vals and conduct hypothesis testing to conform to the frequentist conventions if you really wanted to. I just dont see whats so great about doing that. I mean significance at the 95% level is ok, but at the 94% and its a non-finding? I dont see much benefit in that.
ReplyDelete
Replies
Tom Brown1:17 PM
Thanks Noah: this is all new to me, and I find it very interesting. BTW, yesterday Jason Smith referenced you when he wrote "macro data is uninformative." Can you direct me to where you may have discussed that?
ReplyDelete
Replies
Tom Brown4:19 PM
Noah, if I rewrite your Feynman quote above slightly:

"It's a kind of integrity amongst economists, a principle of economic thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're evaluating a hypothesis, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other set of results, and how they worked--to make sure the other fellow can tell they have been eliminated.

Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can--if you know anything at all wrong, or possibly wrong--to explain it."

On a scale of 0 to 10, how well can those modified paragraphs actually be applied to economists? (give your guess for an average "score" averaged over whatever group you like: economists who write blogs, all economists, whatever). Now Ideally, over this same group, where should that number be in your opinion? Offhand it seems to me like it should be at 10, but perhaps you've got a good reason why it shouldn't.
ReplyDelete
Replies
Kenneth Thomas1:13 AM
You may underestimate the problem of publication bias. If you believe the meta-analyses of minimum wage research starting with Card & Krueger's AER article, virtually all the findings of major job loss (3% per 10% minwage increase) are GIGO. Another econ area where publication bias seems serious is in the spillover effects from foreign direct investment.
ReplyDelete
Replies
Unknown1:44 AM
Frequentists' greatest triumph was when they proved that all swans are white.
ReplyDelete
Replies
Unknown8:09 AM
IMHO the Bayesian prior serves to quantify the researchers bias, whereas the frequentist may go on acting as if they have none, all the while it lingers under the surface, undetected by the formal calculus. That bayesianism is subjective is a virtue because it acknowledges the way that most people actually form judgements and pulls all the subjectivity out into the open.
ReplyDelete
Replies
marcel12:23 PM
"Now, of course, the Frequentist social conventions are weak, inadequate defenses against subjectivism and noise. They have drawbacks, like discouraging the reporting of negative results. And they are subject to being gamed by unscrupulous researchers. But at least they are something."

The last sentence needs tweeking.

"Say what you will about the social conventions of frequentism, dude, at least it's an ethos"

http://www.youtube.com/watch?v=J41iFYO0NQA
ReplyDelete
Replies
Jonathan Goodman1:46 PM
There are more practical issues pointing toward Bayesian or frequentist approaches. Two negatives for frequentism (?):

1. Frequentist hypothesis testing isn't robust against modeling errors. You ask: "What is the probability of getting this data if H_0 is true?" The answer, if the model is at all complex, is: "almost zero". H_0 is not a single number (mean zero), but a complex model.

2. Frequentism gives little insight into error bars

Pro Bayesian: sampling the posterior gives deep insight into remaining uncertainty, given the data.

Anti Bayesian: the prior is completely made up -- "replacing ignorance with fiction".
ReplyDelete
Replies
JohnR2:19 PM
That's a useful point, Noah - I look at it that I'm more likely to take seriously people who find results that contradict their expectations. Most of us tend to (consciously or not) weight things that support what we like to believe. Some of us make a real effort to stay neutral, but few of us actually raise the bar on what we want to see. That way the normal human tendency to see what we prefer is often enough to lead us into error. It works the other way 'round, too, of course - many of us simply cannot see what we don't wish to. Supposedly, that's one thing statistics was invented to deal with. Being married to a statistician, however, I'm well aware that the human need to bend evidence to support a desired idea is perfectly able to handle statistics. That's why you need to see the raw data whenever you can. After all, trust, but verify...
ReplyDelete
Replies
EliRabett5:23 PM
Noah has had this discussion with Gelman before, and Eli riffed off it with Socrates

IEHO you need to have a pretty good idea about the answer to find a useful prior.
ReplyDelete
Replies
David B. Benson9:42 PM
E.T. Jaynes, "Probability Theory: the logic of science" is surely the most entertaining way to learn Bayesian methods.

I prefer using the Bayes factor as a way of comparing two hypotheses, no bias toward either, given the data. One then uses AIC or even BIC to see if the difference between the two hypotheses is large enough. There are again three possible outcomes: H0 is better; H1 is better; equivalent explanatory power.
ReplyDelete
Replies

Add comment