Comments on Noahpinion: Priors and posteriors

I am not Bayesian fan. I think Bayes wanted to co...

2015-02-05T14:48:49.882-05:00

I am not Bayesian fan. I think Bayes wanted to convert nonbelievers into believers and what a way to do that. One can make guess on P(T) (believe or not believe), but one can not refute the miracles P(E|T)! And, if one keeps believing in miracles, we can easily recruits folks to do crazy things, like chopping heads, burning alive etc. Once there, it does not matter what likelihoods are.

Washing out the prior happens when the data are so...

2015-01-31T18:09:40.429-05:00

Washing out the prior happens when the data are so informative that the likelihood function converges to a sharp Gaussian. In this situation, frequentist and Bayesian answers basically always agree. It's the fact that this doesn't always happen that makes the whole question of which approach to use an interesting and important one.

Whenever Frequentists comment on Bayes, it tells y...

2015-01-31T11:31:40.873-05:00

Whenever Frequentists comment on Bayes, it tells you something about Frequentists and nothing about Bayes.

Drop the idea that probabilities are frequencies. Real world frequencies are physical Facts, like the temperature of the sun's surface. They can be predicted, estimated and measured.

Probabilities are tools for modeling uncertainty. If our Evidence doesn't determine some Fact exactly then there's some uncertainty which we model using P(Fact | Evidence). Our evidence may not tell us the exact temp of the sun's surface, but it gives us a range of reasonable values that are consistent with everything we do know about it.

Prob distros model this uncertainty by placing their high probability region (probability mass) over the region of reasonable values. So if the sun's surface temp is reasonably between 5000k and 6000k, then you could use a N(5500, 500) to model that uncertainty. A Bayesian Credibility Interval formed from N(5500, 500) would contain the true temp. The smaller the interval, the less uncertainty naturally.

The goal in modeling uncertainty is to find consequences which are true for almost every reasonable value of the true temp. If you use that N(5500, 500) to determine that some “hypothesis A” has very high probability, then that means A holds true for almost every potential temp in the range 5000k-6000k. Since the true temperature is in that range, then that's strong reason to think A is actually true.

The best way to think of it is as a very intuitive and simple sensitivity analysis. Basically the high probability of A shows that hypothesis A is insensitive to the exact temperature as long as the it's in the high probability region of N(5000,500).

So to select prob distributions take what you know and find a distribution that covers the right area. For example, if you're trying to create a prior used to search for a downed aircraft, you could use the max radius of travel based on how much fuel they had and place a uniform distribution over the entire disk. That way the true location is bound to be within that area. That defines your overall search region before considering anything else.

Alternatively, you could ask an expert for their intuition and create a prior out of that. Of course, if the true location isn't in the high probability region of the expert's prior, then the prior is “wrong” and will lead to a failed search effort.

Generally, creating prob distros is a competition between two competing goals. You want to the make the distribution as “informative” as possible about the true value, but if you shrink it's high probability area (region of plausible values) smaller than your evidence allows then you run the serious risk of missing the true value.

Note, there is no difference between sampling distributions and priors. They both model uncertainty in the exact same way. When you use a NIID model for errors you're not saying that future errors will have a NIID frequency histogram. Statisticians almost never have that kind of knowledge and most of the time it's not true or even meaningful.

What you're really doing is defining a region of plausible values for the one set of errors that actually exist in the data you actually took. You're saying the data you just measured had an error vector which resides somewhere in the high probability region of the NIID (a n-sphere with a radius of a few stdev'). As long as that's where those errors actually are everything is fine; future unrealized errors are irrelevant and can be absolutely anything.

That's why NIID assumptions work better than they “should”. Errors of real measuring devices almost never have frequencies that look NIID over a long period of time. But they don't have to, because that's not what that assumption means and it's not what determines success!

P.S. I respect Senn, but his description of Jaynes's views in that paper was so wrong he lost all credibility when it comes to identifying real Bayesians.

Let's make a deal. I'll do that if you sto...

2015-01-29T17:28:37.233-05:00

Let's make a deal. I'll do that if you stop constantly showing up here and at BV and posting "race and intelligence" crap! Deal?

Do a post about what happens to money when markets...

2015-01-29T10:29:51.214-05:00

Do a post about what happens to money when markets crash . does a conservation of wealth exist?

Every few weeks Noah goes back to Bayes! I though...

2015-01-29T09:35:51.427-05:00

Every few weeks Noah goes back to Bayes! I thought my links to Prof. Brown at Trinity University amplified that priors don't matter as long as...

3.2 Washing Out of the Priors
The idea that P(T) could be based on a mere hunch may seem unsettling. After all,
different people may have very different hunches about the truth of a theory, and so may
begin the process with very different values for P(T)! In a way, however, this does not
matter very much. This is because of the phenomenon of the washing out of the priors.
If you and I begin with very different evaluations of T, but we agree on P(E|T) and
P(E|¬T), then our posterior probabilities will get closer and closer to each other the
more evidence we investigate. In the long run, we will end up with the same assessment
of T even if we started out with very different guesses.

The operating constraint is:

...If you and I begin with very different evaluations of T, but we agree on P(E|T) and
P(E|¬T),

If Noah researches on the web, in the comment sections of his own blog, finds the article of Prof. Brown, and, sits down at one place and reads 9 page paper of Prof. Brown, he would not have these posterior farts. Better yet, just call up Prof. Brown and learn.

"I don't think Senn's point really ha...

2015-01-29T07:56:04.186-05:00

"I don't think Senn's point really has any implications for study design"

That is obviously not true, because his view is not compatible with the strong likelihood principle. Maybe you never accepted SLP in the the first place, but Bayesians of the sort to whom he refers mostly do.

So... is Senn's blog post basically saying tha...

2015-01-29T04:48:27.351-05:00

So... is Senn's blog post basically saying that frequentist approaches are what we use when we can't agree on a Bayesian prior? This makes considerable sense to me, and it does indeed bridge the gap between Bayesian inference and frequentist hypothesis testing quite nicely.

David: If you're looking at approaches to selecting priors, you might be interested in Solomonoff induction. I'm pretty sure that the principle of maximum entropy, or something very similar, could be deduced as a special case of this.

Who is the cutie in the "nested statistician/...

2015-01-29T03:31:04.111-05:00

Who is the cutie in the "nested statistician/ hall of mirrors" image? Is that Stephen Senn? I'm not certain, as the linked post had this rather different 2012 photograph:
https://errorstatistics.files.wordpress.com/2012/01/photo.jpg

Oops, that should be "...why I AM wrong"...

2015-01-29T03:25:06.155-05:00

Oops, that should be "...why I AM wrong" (terribly embarrassing sentence in which to have a typo).

Also, to be on topic, Deborah Mayo has an entire series of posts about that Normal Deviate here:
http://errorstatistics.com/?s=deconstructing+larry+wasserman

Noah, have you come across the [maximum entropy pr...

2015-01-29T02:14:14.338-05:00

Noah, have you come across the [maximum entropy principle] (http://en.wikipedia.org/wiki/Principle_of_maximum_entropy) for choosing priors advocated by E.T. Jaynes?

Excellent post! And some radical reconception of t...

2015-01-29T02:02:51.861-05:00

Excellent post! And some radical reconception of the whole field (relativity, quantum mechanics) should force a re-evaluation of priors, not a Bayesian updating! If you look at Bayes work itself, it is clear he is talking about situations in which we understand the experimental setup well enough to set priors!

I don't understand what Gelman means by this, ...

2015-01-29T01:14:35.950-05:00

I don't understand what Gelman means by this,
"my work on the philosophy of statistics is intended to demonstrate how Bayesian inference can fit into a falsificationist philosophy that I am comfortable with on general grounds."

The term "falsificationist philosophy" has an alarmingly glib connotation. Please tell me why I a wrong, for peace of mind?

3. You could choose a bunch of different priors an...

2015-01-28T21:51:25.216-05:00

3. You could choose a bunch of different priors and see how sensitive the posterior is to the choice of prior.

Bootstrapping?

My alternate answer is that the prior comes form t...

2015-01-28T19:01:39.707-05:00

My alternate answer is that the prior comes form the same place that the likelihood comes from.

Sure you can formulate a prior. Obviously you ha...

2015-01-28T18:28:08.603-05:00

Sure you can formulate *a* prior. Obviously you have to do so in order to carry out a Bayesian estimation. But what Senn and Gelman are saying is that you can't formulate *your own subjective prior*, like a Bayesian agent does.

If you can't formulate a prior, you don't ...

2015-01-28T18:17:21.416-05:00

If you can't formulate a prior, you don't understand the implications of your model, and have no business estimating it.