The argument is politically important, because it tells us how good the Obama administration has been doing. If R&R are right, then Obama has been a good steward of the economy, since America's recovery has slightly outperformed the average of their sample of historical post-crisis recoveries. But if B&H are right, then Obama has done a historically bad job. Thus it is no surprise to find Mitt Romney's economic advisors, in particular John Taylor, hawking the Bordo-Haubrich research and disparaging that of Reinhart and Rogoff.
First of all, do not listen to John Taylor. He is not being a scientist right now, he is being a politician. Paul Krugman is right; this is an example of how politics hurts the academic discipline of economics. But unlike Krugman I think it's inevitable; you can hardly expect John Taylor not to do his job and support his boss. People know to take that into account when reading what he writes, and Taylor knows they take it into account. Are we ever going to get economists to stop advising political candidates? Are we ever going to get political candidates to stop insisting that their advisors support their campaign narrative? To each of these questions I answer: Maybe, but I am not optimistic.
But do pay attention to the academic dispute between R&R and B&H. It's very interesting. How do the two research teams arrive at such different conclusions? Essentially, there are three big differences in the methodologies used by the two teams.
Difference 1: R&R compare recoveries across different countries. B&H only look at the U.S.
Difference 2: R&R define the "strength of a recovery" as the time required to reach the pre-crisis level of GDP per capita; B&H define the "strength of a recovery" as the rate of total GDP growth at a certain time following the trough of the recession.
Difference 3: R&R define a "financial crisis" much more narrowly than B&H.
Let's talk about Difference #1. Because B&H include only the U.S., they ignore episodes like Japan's crisis-and-recovery in the early 1990s. This means that, for one thing, B&H have a much smaller sample than R&R. If you believe that every nation is fundamentally different, this is unavoidable; but if you believe that "financial crises" are a universal phenomenon, then B&H are making a big mistake.
It also means that B&H are comparing across different periods of history. This doesn't seem appropriate to me. For one thing, in its earlier history, the United States was experiencing "catch-up growth", which means that the trend rate of growth was much higher than it is now. For another thing, past eras had considerably higher productivity growth than the current era, which also raised the trend rate of U.S. growth. Finally, as R&R point out in their op-ed, U.S. population growth was higher in the past. B&H, by failing to detrend their GDP series, leave out all of these important facts.
Basically, I think R&R's methodology is much better here. B&H, by refusing to even look at other countries, are potentially throwing away a huge amount of information. Sure, combining samples across countries introduces a lot of omitted variables, but you can always just compare within-country analyses to cross-country analyses and note whether and how the two are different. And you can always just make a list of potential cross-country structural differences. Then you let the reader decide for herself whether cross-country or single-country makes more sense. I think this is much better than simply choosing one specification and sticking with it.
OK, let's talk about Difference #2. This is partly a case of an apples-to-oranges comparison; the two research teams are measuring different things, and their stories are not necessarily incompatible. B&H tell a story of a "string-plucking" effect, where financial crises are followed by very deep recessions, and deeper recessions mean faster, but longer, recoveries. R&R's observation that recoveries from financial crises take longer than others could be consistent with that string-plucking story.
(The point of contention appears to be over the "shape" of recoveries - R&R contend that financial crises produce L-shaped recoveries, while B&H say there is no conclusive evidence of that. The difference is caused by the difference in the definition of "financial crises", which we'll discuss in a moment.)
Note, by the way, that this second point shows that John Taylor is being a bit disingenuous when he uses B&H's results as a stick with which to beat the Obama Administration. Here, and again here, Taylor agrees with B&H and R&R that "there is no disagreement that recessions associated with financial crises have tended to be deeper than those without financial crises." In the "string-plucking" model proposed in the appendix of B&H's paper, they claim that deeper recessions will be followed by faster recoveries; in this model, one reason for a slower recovery under Obama is that the recession of 2009 was not as deep as recessions during the 1800s. So John Taylor is overlooking the obvious implication of B&H's model - that Obama slowed the recovery by reducing the severity of the recession.
OK, on to Difference #3 - the definition of a "financial crisis". My instincts tell me that B&H's more expansive definition of financial crisis is wrongheaded - after all, they include 1981 as a "financial crisis", even though basically everyone believes that that was a "Fed recession" caused by the Volcker disinflation. Intuition strongly suggests that R&R's restrictive definition of a "financial crisis" is much more credible.
BUT, I don't think we should always trust our intuition. It is certainly possible that R&R constructed their definition of "financial crises" by looking at the data, picking out L-shaped recoveries, noticing that what happened to the financial systems of countries right before those L-shaped recoveries looked different in some respects from what happened prior to V-shaped recoveries, and then defined those observed differences as "financial crises".
Is this a bad or wrong approach? Heck no! It's exactly what I would have done. It's a naturalistic approach. You observe patterns in nature and you write them down. That's how science gets all of its insights.
But it's an incomplete approach. If you observe a pattern and then conclude that the pattern is structural, you are data-mining. Before we believe a theory, we need to use it to make out-of-sample predictions. In this case, what that means is that before we accept R&R's definition of "financial crisis", we really need to wait and watch history unfold, and see if subsequent L-shaped recoveries still correlate with the things R&R define as the essential characteristics of a "financial crisis". That will take a long time.
Alternatively, we could use microfoundations. If we successfully identified the processes by which R&R-defined financial crises affect recoveries (and B&H-defined crises don't), we could conclude in favor of R&R's definition without having to wait for out-of-sample crises to unfold.
But until we do at least one of those things, I am not willing to say with certainty that R&R's definition of crises, intuitive though it may be, is better than B&H's.
So, in conclusion: I like R&R's approach better than B&H's, because it comes at the problem from more different angles. This is how I think the best empirical research is done; you ask a question, and then you attack that question with multiple data sources, multiple alternative assumptions, and multiple models. This is how Justin Wolfers, for example, attacked the question of whether prediction markets or opinion polls do a better job of forecasting election results. B&H don't do this; they throw away the information contained in other countries, and they don't try alternative definitions of "financial crisis". In addition, I think they make a mistake by not adjusting their GDP growth data for long-term trends.
And I think no one should take John Taylor's promotion of B&H's results seriously, since he is part of Team Romney.
However, this does not mean I totally believe the results of Reinhart & Rogoff. The fact that their results ring true to me might just be a function of how long those results have been publicized in the media. The fact is, the data sample they have to work with is small and riddled with all kinds of potential confounding effects and omitted variables. That is what macro has to deal with, folks. It ain't pretty.