My post about "Bayesian Superman" wasn't actually intended to be a knock on Bayesianism - it was just about a quirk of rationality. I certainly don't think Bayesianism is a "dangerous religion that harms science"!
But reading this Andrew Gelman essay made me think about Bayesian inference in science, which then got me to thinking about Frequentist inference, and why I think Frequentism is a bit underrated these days.
Frequentist hypothesis testing has come under sustained and vigorous attack in recent years. It's arbitrary, it doesn't obey the likelihood principle, it throws away information, it can lead to silly results. All this is true. But there are a couple of good things about Frequentist hypothesis testing that I haven't seen many people discuss. Both of these have to do not with the formal method itself, but with social conventions associated with the practice. These are:
1. The unwritten rule that you "don't protect the null hypothesis" (or that you "penalize type I errors relative to type II errors"), and
2. The implicit three-valued logic of "hypothesis rejection".
The first of these is about what kind of prior (to use Bayesian language) the scientists should start with. It basically says that you should bias your conclusions against your own hypothesis. This is in contrast to, say, using flat or "uninformative" priors.
The second social convention is more amorphous and difficult to define. It's about what conclusions you draw from the hypothesis test. If you don't protect the null, and you reject the null in favor of your own hypothesis, Frequentism says you've found something interesting, and you should follow up on it. If you fail to reject the null, though, it doesn't mean you believe in the null any more than you did before - it means you shrug and move on. Frequentism implicitly assigns results to one of three categories: 1. "interesting evidence against a hypothesis", "interesting evidence for a hypothesis", and "nothing interesting". It's sort of like three-valued logic, in a way. Compare this to the implicit logic of the likelihood principle, in which you compare alternative hypotheses directly.
Why do I like these social conventions? Two reasons. First, I think they cut down a lot on scientific noise. "Statistical significance" is sort of a first-pass filter that tells you which results are interesting and which ones aren't. Without that automated filter, the entire job of distinguishing interesting results from uninteresting ones falls to the reviewers of a paper, who have to read through the paper much more carefully than if they can just scan for those little asterisks of "significance". Naturally, this filter also has a downside - it creates publication bias against "negative results" that may in fact be interesting. But that may be a small price to pay to avoid the flood of paper submissions that would result if everyone just wrote up and sent in the results of any estimation exercise.
Second, the discipline of the Frequentist social conventions acts against scientists' natural tendency to favor and promote and believe in their own theories. It tries to enforce the idea of scientific honesty. Feynman talks about this in his famous speech, "Cargo Cult Science":
It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked--to make sure the other fellow can tell they have been eliminated.
Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can--if you know anything at all wrong, or possibly wrong--to explain it.The kind of integrity Feynman is talking about concerns systematic error. The Frequentist social conventions are an attempt to do something similar for random error. This provides a natural defense against "scientific trolling", which is a term I just invented to mean "the tendency of unscrupulous researchers to report weak results to advance some ulterior agenda." Scientific trolling means that ulterior-motivated researchers will submit a flood of weak results, while scrupulous, scientifically-motivated researchers will voluntarily restrain themselves from reporting equally weak results that go in the opposite direction. That sort of reporting bias will tend to contaminate the beliefs of a neutral observer. (I can think of at least one very good real-world example of this, but I will be polite and not discuss it on a blog.)
Now, of course, the Frequentist social conventions are weak, inadequate defenses against subjectivism and noise. They have drawbacks, like discouraging the reporting of negative results. And they are subject to being gamed by unscrupulous researchers. But at least they are something.
Bayesian inference seems to me like a perfectly fine and good method of inference. It's more appealing in many ways than classical Frequentism. But I think that Bayesianism might want to get some standardized social conventions similar to (and hopefully superior to) the Frequentist ideas of "not protecting the null" and "statistical significance" (note: it may already have these, and I'm just not aware of them). These conventions would unavoidably be arbitrary, and would throw away some information in many cases. But they would help lean against the natural incentives of the scientific reporting and publishing process. Maybe there could be more than one set of conventions, for use in recognizably different situations.
Classical Frequentist hypothesis testing is probably on its way out in the long term. But the fact that it has survived and dominated scientific publishing as long as it has, in spite of all its well-known problems, might be a testament to the usefulness of the unspoken social conventions associated with it.