Saturday, June 30, 2018

Book Review - "The Space Between Us"


"Hey, there ain't no space between us!"
- a flight attendant who saw me reading this book


This is a very important book about a very important topic (segregation and race relations). It is also a book that strongly agrees with my priors about how the world works. And not just my priors, but with my desires - I want segregation to be a bad thing. So because I'm so biased in favor of this book's thesis, I'm going to try to be especially hard on it in this review. Just realize that that's what I'm doing here. You should absolutely read this book. The research it explains is eye-opening, well-executed, and very important for our national future. And the theory that Enos weaves to explain his observations probably captures important features of reality, and deserves to be a central part of our national discussion.

Having said that, let me proceed to being overly critical.


The Basic Idea

A very simplified version of Enos' basic theory goes like this: Racial conflict is exacerbated by segregation, proximity, and outgroup size. In other words, when you have a bunch of people living very close to you, but who are also kept separated from you, you start to view them as an enemy group, and you vote and behave accordingly.

You can easily imagine a situation like this. Suppose you live in an all-Protestant neighborhood, separated from an all-Catholic neighborhood by a wall. The wall makes it easy to think of them as a hostile enemy tribe. But since the "enemy" is right there, just over the wall, in great numbers, you live in fear of them.

Enos delves into the psychology of why this might happen, but the basic idea is not hard to comprehend. 

One subtle but crucial point is that Enos thinks the impact of geographic segregation is distinct from the impact of contact. In other words, simply interspersing people of different races will reduce tension independently of how they interact with each other, since interspersing people reduces the degree to which they think of each other as belonging to separate groups. 

It is this last part of the theory that, in my opinion, ends up being the weakest link in the chain, with important and unsettling consequences for policy.


Testing the Theory

There's no way to test this theory directly other than to design and populate cities from scratch. Instead, researchers like Enos have to rely on four limited techniques of observation:

1. Correlational studies

2. Lab experiments

3. Natural experiments

4. Randomized controlled trials

Each of these approaches has its limitations. 

Correlational studies are subject to selection problems and lots of other types of confounding effects. What we really want to show is causation.

Lab experiments can demonstrate that a stylized version of a social science theory holds in a laboratory setting. But the real world may be very different than the lab, in a lot of ways that matter. The experiment might just be a bad analogy for the real world - for example, when people claim that a few undergrad students trading in an econ lab for stakes of $10 is not similar to a real-world market with high stakes, repeated interactions, and knowledgeable participants. Also, the real world may simply have so much else going on that an effect identified in a lab, though real, just isn't very important. 

Natural experiments are great (as long as you correctly identify a natural experiment instead of imagining one exists when it didn't really). But the limitation of natural experiments is that they don't measure exactly what you want them to. They are found by accident, so they're never quite what you want. And they can never be precisely replicated in different contexts, like lab experiments can.

And RCTs are limited by size. If you could get funding (and IRB approval) to make whole new cities, it would be easy to test the effects of geography on race relations. But in the real world, you're stuck with small stuff, like sticking a couple of guys on a train platform. These small-scale RCTs don't always scale up, and there's lots of stuff you can't control, and they're expensive to replicate in different contexts.

Of course, researchers know about all of these limitations, and Enos explains them at length in "The Space Between Us". And he does exactly what a researcher ought to do when faced with these limitations - he uses all four methods. 

But even using all four methods doesn't mean you can verify a social science theory as big and sweeping as Enos'. Even the most diligent, careful, brilliant researcher can sometimes seem like a master swordsman hacking away at a boulder.

Despite these limitations, Enos does - in my opinion - convincingly demonstrate two-thirds of his theory. He does show that the size and proximity of an outgroup pretty predictably generate negative feelings toward that outgroup. But the third part of his theory - the idea that geographic segregation plays a big role, above and beyond the impact of human interaction, in determining which groups get defined as an "outgroup" in the first place - is harder to demonstrate. And it's here that I feel Enos; methods, though probably the best available, don't end up being conclusive.

This is due to two interrelated problems: A) the question of contact vs. context, and B) the problem of scale. Both are problems that Enos discusses extensively, but in the end I don't think there's an easy solution. 


Context or Contact?

"Contact" is human interaction. "Context" is the overall situation humans are in - in this case, where people live. There is lots of evidence that extended contact with people of other groups gives people a more positive attitude towards those other groups (though some kinds of contact may be more effective or less effective at this task). Negative contact, meanwhile, can increase prejudice. Enos' theory, however, is about context - it's about living arrangements having the power to change attitudes above and beyond the effect of direct interaction.

The problem is that it's very difficult to separate contact from context, observationally. In a lab, you can control the two - you can have people sit in chairs not talking to each other, or allow them to talk. But in the real world, it's hard to tell who's interacting with whom. If you put Protestants and Catholics next to each other in a city and you see a deterioration in relations between the two, was it due to proximity (Enos' theory) or due to some kind of negative interaction that sprung up between the two? If you see that desegregation leads to an improvement in race relations, was it because people got used to each other after chatting on the street, or because desegregated living arrangements made group differences less salient (Enos' theory)? Hard to tell.

Sometimes context and contact aren't even conceptually distinct. For example, take Enos' famous Boston Train Experiment. In this experiment, Enos sent Spanish-speakers to train stations in Boston, and found that observing Spanish speakers made Anglophone whites more likely to take a hard line against immigration.

Was this an experiment about contact, or context? The title of Enos' paper is "Causal effect of intergroup contact on exclusionary attitudes", which would seem to indicate that it's the former. But in "The Space Between Us", Enos writes that he "altering both space and contact", "increasing socio-geographic impact", and "moving Boston to the right on the horizontal axis of the plane of context" (p. 110). He thus claims that the Train Experiment altered context - that it didn't just represent an interaction between Anglo white Bostonians and Spanish speakers, but that it actually made those Anglo white Bostonians feel like Spanish speakers had moved in next to them. Enos thus claims this experiment as evidence for the impact of context on racial attitudes.

The truth is, we don't know which it was. It might be that the Anglo white train commuters were annoyed at the experience of hearing a language they didn't understand, and that Enos was therefore measuring a negative contact effect. Or it's possible the Anglo white train commuters really did feel like their neighborhoods were becoming more Hispanic. (Asking additional survey questions might have helped differentiate these two hypotheses, but those responses might not have been completely reliable.)

Enos says that when possible, he attempts to control for intergroup contact when measuring the effect of context. This is the right approach, but the problem is that it's often impossible outside of a lab. The same issue crops up in some of the other studies Enos describes in the book. Personally, I think that Enos theory describes a real phenomenon - context matters, and probably in the way Enos describes. The lab experiments Enos runs, together with his correlational studies, add to the pile of circumstantial evidence.

But the fact that all the ecological causation studies involve a lot of contact makes it hard to identify and validate Enos theory. And there's a second problem that directly impacts the theory's potential usefulness: the problem of scale.


Proximity or Segregation?

Enos' theory is that all else equal, proximity increases racial tensions, and segregation increases them as well. But how do you tell the difference between the two? If a black family moves in nextdoor to me, is that decreasing segregation (which Enos thinks should soften racial tensions) or increasing proximity (which Enos thinks should heighten racial tensions)?

In a very nice diagram on p. 26, Enos explains the difference between desegregation and proximity. In one panel, the white and black dots are all clumped together, but the two clumps are very close. In another, the white and black dots are interspersed:


Visualized thus, the distinction seems to make sense. But what if we zoom out? What if each clump becomes a dot, and the clumps become interspersed? A high-segregation, high-proximity situation (bad in Enos' theory) would then become a low-segregation, high-proximity situation (not so bad in Enos' theory), just by zooming out and considering a different scale.

To put this another way, imagine a neighborhood where every block is either all black or all white, but the white and blacks alternate. Is that integrated or segregated? Suppose you think it's segregated. Now change it so that each block is integrated, but each building is either all black or all white. Is that integrated or segregated? Note that if we're free to keep increasing the resolution of our segregation measures, so that every "desegregation" still results in a "segregated" distribution, then each "desegregation" is just an increase in proximity (bad in Enos' theory).

The lack of any guide to what resolution we should use to measure segregation means that this resolution can be used as a free parameter, to make the overall theory ("proximity bad, desegregation good") fit almost any outcome. Enos is definitely tempted to do this at times. On p. 203 he writes:
[I]n fact, typical measures of segregation probably understate the actual segregation in Los Angeles because much of the separation between Latinos and Blacks happens at a much finer level, alternating from block to block within neighborhoods, and our measures of segregation are not equipped to capture this.
And on p. 223:
As populations became intermixed in closely segregated blocks, proximity between groups increased.
What is desegregation, if not intermixing populations?

And on p. 20:
For my purposes, though, there is no single "right" unit [of geographical area], but rather the psychologically salient local environment of each individual. 
But if the researcher is free to guess what environment is salient, how can the theory be tested?

Throughout the book, Enos is consistently better at measuring the impact of proximity than the impact of segregation. His most eye-opening and well-designed study is a 2015 paper looking at how white people in Chicago changed their voting patterns after nearby mostly-black housing (such as the Cabrini-Green Homes project) was torn down and poor black residents dispersed. Enos finds that white people who lived near the projects voted less, while white voters far away from the projects didn't change. It's a natural experiment, and is thus a powerful demonstration of how the proximity of an outgroup can raise racial threat. Enos measures proximity by physical distance, rather than any predetermined unit of area, which lends credence to his finding.

But while researchers can use distance to measure proximity in a study like this, they can't use it to measure segregation. Segregation, unlike proximity, has no natural units, so to measure it we have to specify a resolution at which to measure the dispersion or concentration of groups of people.

Ideally, that resolution should be included as a parameter in a quantitative model, along with proximity (represented by distance), relative size, and maybe some other variables. The segregation-resolution parameter could be estimated on one dataset (say, Chicago), and then tested on other data sets (say, New York City, Los Angeles, etc.). If the segregation resolution that worked in Chicago also worked to predict racial attitudes in NYC and L.A. and elsewhere, it could be treated as a structural parameter - a more-or-less universal constant of human psychology.

Of course, this is a lot easier said than done. It requires extremely high-resolution datasets on where people of various groups live. AND it requires natural experiments in multiple cities in order to validate the model out-of-sample. Much much easier said than done.

But in the meantime, we're left to wonder...and worry.


The Question of Policy

The overarching question of "The Space Between Us" is whether or not Hispanics and other Americans will experience racial conflict in the years and decades to come - and, even more importantly, how to prevent or reduce this conflict. Should we implement initiatives designed to get Latino and Anglo populations to mix more? Would that exert a psychological effect that would reduce the salience of the difference between the two groups, causing them to start to think of themselves as one single group? Or would it exacerbate the backlash that led to Trump's election?

Take Enos' statement on p. 223, recounting a case in Los Angeles where "as populations became intermixed in closely segregated blocks, proximity between groups increased." According to his theory, that's a recipe for conflict. If block-by-block segregation is even worse for race relations than neighborhood-by-neighborhood segregation (because of higher proximity), what does that say about the prospect for the success of federal housing desegregation initiatives? If these resulted in "populations became intermixed in closely segregated blocks", would that backfire and make race relations worse?

It's because of this question - which "The Space Between Us" doesn't answer - that the book ends up having an uncomfortably alt-right sort of undertone. Enos provides lots of evidence about why proximity between racial groups induces conflict - a staple of alt-right thinking - but little evidence that desegregation could be used to reverse the problem, or even what that would entail.

In fact, Enos' otherwise wonderful diagram on p. 26, showing the difference between proximity and segregation, has a very disturbing picture in the lowest panels. When illustrating a "low proximity, low segregation" situation - i.e., what Enos thinks would minimize racial conflict - it displays a dense clump of white dots at the center, surrounded by a far-flung scattering of black dots:


Not exactly what I think of when I think of "desegregation". And not exactly the racial-geographic future I imagine for a tolerant, integrated America.


Back to Contact

Enos' book does offer a ray of hope regarding America's racial future: Tuscon, Arizona. In the final chapter, he describes how Tuscon has achieved much more harmonious relations between Anglo whites and Hispanics, through long-term positive interaction between the two groups. But he worries that in the rest of America, far-flung suburban development patterns and the increasing social isolation described by Robert Putnam will conspire to prevent this sort of long-term positive conflict, leaving Anglos and Hispanics permanently and bitterly divided.

In other words, Enos' good and bad visions for America's future depend not on context, but on contact. He doesn't propose large-scale desegregation initiatives (perhaps because of the measurement difficulties described above). Instead, his vision of racial tolerance relies on something outside the scope of his theory: long-term positive contact.

And in fact, this seems like exactly the right approach. Enos' theory may be right - and in fact, in spite of the measurement difficulties I still think it is right, and that there is some structural psychological scale at which segregation operates. But that doesn't mean it's helpful.

In Enos' theory, there are basically three ways to reduce racial conflict:

1. Reduce proximity between races. This sounds scary and bad.

2. Reduce the size of minority outgroups. This sounds even more scary and bad.

3. Reduce segregation. This is obviously the good option. But measurement difficulties mean that it's hard to know how to do desegregation right.

So instead of trying to use context-based theories to heal racial divides, it seems like we should use contact-based ones - in other words, we should do desegregation in a way that's designed to facilitate positive long-term contact among people of different races.


A Big Complicated World

Fortunately, there are probably additional ways to address the problem of race relations in America. Enos' book, like many books that are centered around a theory, tends to ignore or downplay all the other factors that affect attitudes toward outgroups. For example, in America, black-white relations are deeply affected by the history of slavery, Jim Crow, lynchings, race riots, and other terrible events; that will make Anglo-Latino relations in Phoeniz different from black-white relations in Chicago in ways Enos' theory doesn't describe. When measuring general attitudes towards outgroups, relative amounts of wealth and political power - which Enos touches on only lightly - should be taken into account as well.

This isn't a problem with "The Space Between Us", it's just a natural limitation of this sort of book. When reading it, you have to keep in mind that there's a lot of other stuff going on in the world.

But that also offers a reason for hope. There are probably many ways of improving race relations that don't involve the expensive, politically difficult, long-term process of changing living patterns and urban development. Geography is undoubtedly a big factor, but it's not an iron law that governs everything that happens to our society.


Anyway, that's it for my overly critical review. Just remember to put these caveats in context (no pun intended). "The Space Between Us" is definitely a book worth reading - the research it describes is both well executed and eye-opening, and the theory it puts forth probably describes a very real phenomenon. 

Sunday, April 01, 2018

DeLong vs. Krugman on globalization


I'm going to do the inadvisable, and argue with Brad DeLong. Hopefully this will turn out OK, since it's in response to Brad doing the inadvisable and arguing with Paul Krugman (thus breaking at least two of his own rules).

The topic is globalization. Krugman has a new essay in which he lays out what seems to be a rapidly crystallizing conventional wisdom on the recent history of globalization. Some excerpts:
[D]uring the 1990s a number of economists, myself included...tried to assess the role of Stolper-Samuelson-type effects in rising inequality...[these analyses] generally suggested that the effect [of factor price equalization from globalization] was relatively modest, and not the central factor in the widening income gap... 
[T]he basic fact in the mid 1990s was that imports of manufactured goods from developing countries [were only] around 2 percent of GDP....[T]his wasn’t enough to cause more than a few percent change in relative wages... 
In retrospect, however, trade flows in the early 1990s were just the start of something much bigger... 
Until the late 1990s employment in manufacturing, although steadily falling as a share of total employment, had remained more or less flat in absolute terms. But manufacturing employment fell off a cliff after 1997, and this decline corresponded to a sharp increase in the nonoil [trade] deficit, of around 2.5 percent of GDP. 
Does the surge in the trade deficit explain the fall in employment? Yes, to a significant extent...[A] reasonable estimate is that the [trade] deficit surge...explains more than half of the roughly 20 percent decline in manufacturing employment between 1997 and 2005...[S]oaring imports did impose a significant shock on some U.S. workers... 
The 90s consensus, however, focused almost entirely on asking how the growth of trade had affected the incomes of broad labor classes, as opposed to workers in particular industries and communities. This was, I now believe, a major mistake – one in which I shared... 
This is where the now-famous analysis of the “China shock” by Autor, Dorn, and Hanson (2013) comes in...[T]he effects of rapid import growth on local labor markets...were large and persistent...
So does this mean that...trade war would be in the interest of workers hurt by globalization? The answer is, as you might guess, no... rapid change appears to be largely behind us: many indicators suggest that hyperglobalization was a one-time event, and that trade has more or less stabilized relative to world GDP... 
So while the 90s consensus on the effects of globalization hasn’t stood the test of time very well, one can acknowledge that without accepting the case for protectionism now. We might have done things differently if we had known what was coming, but that’s not a good reason to try turning back the clock.
In other words, the new conventional wisdom on trade and globalization can be summed up as:

1. Trade was pretty much good until the late 90s or 00s,

2. The China Shock was unprecedented, and hurt lots of workers in America and other rich countries, and

3. Now the China Shock is over, and a trade war would be bad news.

This is a story I myself have told, in a series of Bloomberg posts.

DeLong is not having it. He has a long essay in which he claims that the supposed negative effects of globalization in the 2000s were, instead, entirely due to bad macroeconomic policy.

I sagree with much of what DeLong writes, but I also disagree with some of it. Here's a point-by-point breakdown of the parts that strike me either as questionable or as not completely to-the-point.

DeLong:
I think that from the early 1970s to the mid-1990s international trade, at least working through the Heckscher-Ohlin channels, put less than zero downward pressure on the wages of American "unskilled" and semi-skilled workers...From the early 1970s to the mid-1990s the relative wage levels of the then-current sources of America's manufacturing imports were rising more rapidly than new low-wage sources of manufacturing imports were being added. The typical American manufacturing worker faced less low-wage competition from imports in the mid-1990s than they had faced in the early 1970s.
DeLong thinks this contradicts Krugman, but I don't think it does. Krugman is considering only the latter part - the addition of new low-wage trading partners (and the effect of this, even considered in isolation, was small). I think Krugman would agree with DeLong that erecting trade barriers that prevented the entry of new low-wage trading partners into the global trading system in the 1970s, 1980s, and 1990s would have had net negative effects that far outweighed any positive Stolper-Samuelson effects.

DeLong:
[W]e could have protected Detroit and Pittsburgh from the consequences of their managerial and technological failings—but it would have been at immense cost for the rest of the economy, a very unfavorable benefit-cost tradeoff. 
In fact, the U.S did quite a lot to try to protect Detroit and Pittsburgh. We jawboned Japan into appreciating the yen and implementing Voluntary Export Restraints, and enacted all kinds of protectionist measures toward European steel. The protectionist measures probably failed to help U.S. steel or auto companies, or their workers, in the long run. But there is the possibility that it was these measures that prompted Japan to start building its car factories in the United States. Most Japanese cars sold in the United States are now also made in the United States, which has sustained quite a number of manufacturing jobs.

Meanwhile, DeLong overlooks the possibility that U.S. research spending, intended as a protectionist industrial policy measure, led to positive externalities that helped the U.S. technology industry become as successful as it is today. We tend to think of manufacturing's importance in terms of the good blue-collar semi-skilled jobs of the 1950s, but I think this perspective is severely limited. There are a number of reasons we might want high-value-added manufacturing to stay in the United States that have nothing to do with factory employment - it generates local multipliers, it creates products that are easy to export, and it may have a beneficial effect on the overall productivity growth of the economy.

DeLong:
[T]he coming of "hyperglobalization" strengthened opportunities for U.S. workers without formal education to find jobs where their skills, experience, and tacit knowledge could be deployed in ways that were highly productive.
For manufacturing workers, this seems to be directly contradicted by Autor et al.'s "China Shock" paper, which shows that workers exposed to Chinese imports tended to experience greatly reduced lifetime incomes. (Autor et al. also claim that the China shock had negative aggregate employment effects, though this claim is heavily model-dependent and the model is kind of iffy.) In any case, DeLong's claim that globalization in the 2000s improved productivity for U.S. workers overall is in need of some empirical support. There are papers that do say that Chinese import competition spurred U.S. innovation, but this doesn't necessarily support a story about beneficial worker reallocation.

DeLong:
What "hyperglobalization" did do was provide the top 1% and the top 0.1% with another lever to break apart the Dunlopian labor relations order, break the Treaty of Detroit, and redistribute the shared joint product from highly productive mass production backed by valuable communities of engineering practice upward in the income distribution. But there were many such levers in the U.S. from the 1970s to today. And "hyperglobalization" was, as I see it, one of the weakest and shortest of them.
This is another claim in need of evidence. It's true that unionization started declining in the U.S. since before globalization or hyperglobalization really got underway. But it's also possible that the U.S. weakened its pro-union laws and law enforcement because of fear that union wage demands would kill American competitiveness in the face of increasing import competition.

More importantly, in absolving globalization of the blame for rising inequality, DeLong ignores the cross-country evidence. Here are some graphs of the Gini coefficient of disposable income in various rich countries.

Graph 1:

Graph 2:

It seems highly unlikely that market fundamentalism and plutocracy were such a potent mind-virus or political movement that they simultaneously prevailed not only in the United States, but in Sweden, Denmark, France, Germany, and Japan.

The global nature of the runup in inequality across countries with very different policy regimes implies that it was something global - some combination of trade and technology - that did the trick. To write trade out of the equation and to blame technology mostly or entirely seems suspect, at least without solid empirical evidence. Having read a lot of papers on this topic, I'd say there's very little consensus.

DeLong:
Moreover, from the perspective of the country as a whole and from the perspective of many of the communities affected, the China shock was not a big deal for local labor markets. Yes, people are no longer buying as many of the products of American factories as Chinese imports flood in. But those selling the imports are turning around and spending their dollars investing in America: financing government purchases, infrastructure, some corporate investment, and housing. The circular flow will it: the dollars are of no use outside the U.S. and so the dollar flow has to go somewhere, and as long as the Federal Reserve does its job and makes Say's Law roughly true in practice, it is a redistribution of demand for labor and not a fall in the demand for labor.
The idea here is that trade deficits involve increased foreign financial investment into the United States, because a trade deficit is matched by a current account deficit. But an increase in foreign portfolio investment does not imply an increase in business or government investment (in things like infrastructure, housing, or whatever). In fact, if a trade deficit corresponds to a decrease in national savings - as it did in the 2000s, during "hyperglobalization" - then U.S. business/government investment goes down, not up. More generally, the idea that real investment is sensitive to the cost of capital is pretty suspect. Some people claim that cost of capital matters a lot, but the evidence for this is very iffy. 

DeLong:
And here is the kicker, as I see it: the types of people and the types of jobs funded by the imports of the China shock looks very much like the types of people and the types of jobs displaced from the tradeable manufacturing sector. Yes, some local labor markets got a substantial and persistent negative shock to manufacturing, often substantially cushioned by a boost to construction. Other local local labor markets got a substantial and persistent positive shock to construction. And on the level of the country as a whole the factor of production that is (truly) semiskilled blue collar labor does not look to me to have been adversely affected.
Again, the idea that displaced manufacturing workers got just-about-as-good jobs in other sectors (like construction) is directly contradicted by the Autor paper. In fact, job displacement of any kind seems to hurt lifetime income

As for semiskilled blue collar labor being adversely affected or not, it's certainly true that wages and income for the lower quintiles of the distribution stagnated during the 2000s, before taking transfers into account. Autor et al. haven't proven that China was the big culprit behind this stagnation, but others haven't disproven it either. Krugman, for his part, seems only to claim that it was one nontrivial factor. It doesn't make sense to dismiss it quantitatively until better evidence is in. 

DeLong:
And this gets me to my fifth quarrel with Paul Krugman here. As I see it, the most important thing we missed about globalization was how much it required support from stable and continuous full employment.
If globalization increases the costs of fiscal austerity and tight money, that seems to be a mark against it, even if you're totally against austerity and tight money. Policy is stochastic. Bad leaders get elected, foolish officials get appointed, and humans make mistakes. Anything which makes the economy more fragile in the face of random bad policy draws seems like it's imposing a cost on the economy, since we can't always count on getting good draws. 

So I agree with DeLong on a number of issues here. It's important to have good countercyclical fiscal and monetary policy. Deregulation-fueled financial crises, and bad policy responses to these crises, are scarier than globalization. We should think about globalization's hard-to-measure positive effects in addition to its easier-to-see negative effects. 

But in his zeal to defend globalization from the Trumpists, I think Brad has overstated the case against the new conventional wisdom articulated in Krugman's essay, and overstated the case for not worrying about China Shock type cases. And thinking of the alternative to free trade as being base, crude, Trumpist protectionism rather than research-heavy industrial policy aimed at boosting high-value-added exports - certainly not a way of thinking unique to Brad, but rather a false dichotomy that is endemic throughout the economics commentariat - seems like a failure of our collective vision.  

Monday, December 18, 2017

Sheepskin effects - signals without signaling


Bryan Caplan is one of the most enthusiastic proponents of the signaling theory of education, and this theory plays an important role in his new book, "The Case Against Education". But I've always had a number of problems with this theory, and also with its application to the education policy issues. Recently, I wrote a Bloomberg View post in response to an essay Bryan wrote in The Atlantic that was adapted from his book. Now, Bryan has responded to my post. He makes many interesting points, but here I'd just like to deal with one issue - the issue of sheepskin effects, and which model they support.


Sheepskin effects

Sheepskin effects are central to my debate with Bryan. In brief, the sheepskin effect is the fact that most of the college wage premium vanishes if you drop out right before finishing (e.g. in the final semester). Bryan, and many proponents of the signaling model, believe that sheepskin effects are solid evidence that college is mostly about signaling. On the other hand, I believe that sheepskin effects are strong evidence against the signaling model, and are consistent with the human capital model of education. 


Why sheepskin effects are evidence against the signaling model

First, why are sheepskin effects evidence against the signaling model? Simple: In the signaling model, the signal must be costly. If signals are not costly, there can be no separating or hybrid equilibrium. Without a separating (or hybrid) equilibrium, there is no return to sending the signal. In the model, low-type agents choose not to send the signal because doing so doesn't pass a cost-benefit test.

In other words, if completing the last semester of college is very hard, it can serve as the type of costly signal that could explain the college wage premium in the signaling model. But if completing one more semester of college isn't very hard, then the signaling model can't describe what's going on.

How hard is it to finish the last semester of college? For some people it would be very very hard - but these people are unlikely to have completed all the other semesters of college prior to the last one. For someone who just finished 7 or more semesters, one more semester probably is not that hard.

Also, if agents are even close to rational - as the signaling model assumes them to be - then they wouldn't complete 7 semesters of college only to balk at the finish line. That would be very very suboptimal behavior - a waste of years of effort and years of foregone earnings, not to mention tuition. 

Caplan writes:
Noah fail[s] to look at school from the point of view of a weak student.  One more semester may seem like nothing to those of us who readily finish.  But for students who find classes boring and baffling, even the thought of enduring even one more semester of academics is agonizing.  
Agonizing, perhaps, but much more agonizing than the last 7 semesters? It seems highly unlikely. And why would a rational agent endure 7 semesters of agony (and foregone earnings and sky-high tuition) for practically no payoff?

Therefore, sheepskin effects are not consistent with the signaling model.


Why sheepskin effects are consistent with the human capital model

How could the last semester of college be so much more important for the building of human capital than the other 7 semesters combined? It cannot. So how can sheepskin effects be consistent with the human capital model of education? Here's how.

Education, in empirical research jargon, is a "treatment." In the human capital model of college, that treatment has different effects on different people - some study diligently and expand their perspectives greatly and build their networks and learn with an open mind, while others party and slack off and waste time on Twitter and fail to learn. 

Employers try to tell whether the treatment worked. They look at GPA, for example. But if many of the human capital benefits of college don't come from grades, but from social networks, personal growth, etc., GPA doesn't tell you all you need to know about whether the treatment worked. So as an employer, you'd try to look for other clues as to whether college improved a student or not. 

Dropping out of school is one such clue. It could mean that you didn't build human networks valuable enough to keep you hanging around. It could mean that you have some emotional problem, and that college therefore didn't give you the emotional maturity that it tends to give most people. In other words, even if the treatment typically works, dropping out - including dropping out right before the finish line - could indicate that the treatment didn't work for you.

Bryan didn't like the analogy I used to explain this idea in my Bloomberg View post, so here's a better (and more fun) one. Suppose a bunch of people are applying to be S.H.I.E.L.D. agents. To be a S.H.I.E.L.D. agent you have to take a serum that makes you a superhero. But the serum doesn't work on everyone - some people it dramatically weakens due to an allergic reaction. So 20 people take the serum. Nick Fury inspects them, and they all seem fine...until 2 of them fall unconscious. These two, obviously, are not hired as S.H.I.E.L.D. agents, while the other 18 are hired.

In this example, the serum DOES build human capital, and falling unconscious in the inspection line is like dropping out just before finishing college.


Sloppy use of the word "signaling"

"But wait, Noah," you may ask. "Aren't 'clue' and 'sign' just synonyms for 'signal'? Didn't you just describe signaling?"

The confusion here is due to sloppy use of the word "signaling." Are we talking about the Spence signaling model, or are we using "signal" to mean "any piece of information"? I believe that if you want to use the fame and the imprimatur of the Spence signaling model to support your view of college, you should stick to that model. 

Also, the kind of "signal" I described in the previous section is 100% compatible with college's value being 100% human capital. Caplan and other detractors of the college system treat "signaling" and "human capital" as mutually exclusive - if dropping out is a (costless) signal of whether you derived human capital from college, then the human capital model is right. 

Simply saying "well it's SOME kind of signal", and relying on the multiple uses of that English word to avoid careful consideration of how the models work, is not good economics! Sheepskin effects look much more like a truth-telling equilibrium in a model where students receive private stochastic shocks to their utility functions.


Sheepskin effects and the consumption/sorting model of college

I do not believe that 100% of the college wage premium reflects the return to college - I believe some fraction of it represents ability sorting. Nor do I believe that 100% of the price students pay to go to college represents investment - I believe some fraction of it represents consumption. College is fun. I believe that college does build some human capital, but part of the institution represents super-smart kids paying to party with each other at Harvard while pretty-smart kids pay to party with each other at Ohio State.

This is a third model of college - the consumption/sorting model of college. I believe that together, the human capital model and the consumption/sorting model explain most of both the price of college and the college wage premium.

Sheepskin effects are consistent with the consumption/sorting model. Employers use your college as a proxy for your ability. But if you drop out right before the finish line, that provides employers with additional, more detailed information about your ability. It might indicate that you're a smart person with emotional problems, motivational problems, or trouble with the law. In other words, it's a way that employers can improve the precision of their information about your ability, beyond relying only on the noisy proxies of alma mater and GPA.

Again, this explanation relies on sheepskin effects being a "signal" in the general, English sense, but not in the specific Spence model sense.


In conclusion, sheepskin effects are consistent with the human capital and consumption/sorting models of college, but not consistent with the signaling model.


Update

Bryan responds. He writes:
Plenty of kids slog through two or three years of college, then get so distracted or disgruntled they fail to finish.  Their exasperated parents could reasonably say, "How hard can it possibly be to finish?!"  But social scientists should just work our way backwards from their failure to finish to the subjective difficulty of doing so.
No doubt. Kids often discover their own ability/motivation level by trying college. That process of self-discovery isn't signaling, but it is important for labor markets. The onus is on college's detractors to prove that this is an inefficient mechanism.

Bryan:
Noah's right that conventional signaling models assume everyone's rational.  But they don't need to.  As long as employers are roughly rational, students can act impulsively without changing the main lesson of the model: Education pays you for what you reveal about yourself, rather than what you actually learn along the way.
Obviously, college has an ability-sorting component. But it isn't very costly, and the cost (taking SATs and AP tests and such) is paid in high school when you apply.

We all know that some part of the college wage premium is not a return to college - it's a return to ability, which is indicated by test scores and grades and such. The ability premium is present whether you finish or drop out - "I dropped out from MIT" implies "I got into MIT". That's on top of the return to college (human capital) and the penalty for observable negative characteristics (early dropout).

Signaling just isn't necessary here. Nor does any of this imply that college is economically inefficient.

Bryan:
Forget models and look at actual human beings.  Plenty of people will put up with something unpleasant for years, then snap.  This is especially true for people who are relatively non-conformist.  And as I've repeatedly said, conformity to social norms is one of the main things employers are looking for.
"Forget models and look at actual human beings" is a phrase I expect to hear from anthropologists, not economics professors! But OK. When I forget models and look at human beings, I see college giving people invaluable life perspective and emotional maturity. When I apply formal models - the signaling model that Bryan invokes again and again to support his case - I find that it doesn't make sense as a major reason for the college wage premium. What's left?

Bryan:
There's no confusion on my part.  Yes, you can equate "signaling" with a literal interpretation of Spence's model.  But it's far more enlightening to treat the Spence model as a mathematical parable - then see how much of the real world the parable illuminates.  Anything that raises the conditional probability of X signals X.  If the world happens to reward X, this spurs people who lack X to send misleading signals of X in order to receive those rewards.  These are the Spencean insights that matter - not the details of any specific model.  
I just can't agree here. If a signal isn't costly, the Spence model isn't a good parable for it, because the Spence model crucially relies on the cost of a signal (really, the cost difference between types) to produce a separating equilibrium. Otherwise everyone lies.

Bryan appears to be taking any observation that employers make about prospective employees during their college years and labeling that "signaling", then concluding that college is waste. That is pretty obviously an incorrect inference. 

Wednesday, November 15, 2017

The "cackling cartoon villain" defense of DSGE


I've sworn off macro-bashing. I said what I had to say. And I'm seeing lots of young macro people doing good stuff. And the task of macro-method-criticizing has been taken over by people who are better at it than I am, and who have much better credentials. My macro-bashing days are done.

But sometimes I just have to offer macro folks some marketing advice.

The new defense of DSGE by Christiano, Eichenbaum, and Trabandt is pretty cringe-inducing. Check this out:
People who don’t like dynamic stochastic general equilibrium (DSGE) models are dilettantes. By this we mean they aren’t serious about policy analysis. Why do we say this? Macroeconomic policy questions involve trade-offs between competing forces in the economy. The problem is how to assess the strength of those forces for the particular policy question at hand. One strategy is to perform experiments on actual economies. This strategy is not available to social scientists. As Lucas (1980) pointed out roughly forty years ago, the only place that we can do experiments is in our models. No amount of a priori theorizing or regressions on micro data can be a substitute for those experiments. Dilettantes who only point to the existence of competing forces at work – and informally judge their relative importance via implicit thought experiments – can never give serious policy advice.
That reads like a line from a cackling cartoon villain. "Buahahaha, you pitiful fools" kind of stuff. It's so silly that I almost suspect Christiano et al. of staging a false-flag operation to get more people to hate DSGE modelers.

First, calling DSGE critics "dilettantes" was a bad move. By far the best recent critique of DSGE (in my opinion) was written by Anton Korinek of Johns Hopkins. Korinek is a DSGE macroeconomist. He makes DSGE models for a living. But according to Christiano et al., the fact that he thinks his own field has problems makes him a "dilettante."

OK, but let's be generous and suppose Christiano et al. didn't know about Korinek (or Ricardo Caballero, or Paul Romer, or Paul Pfleiderer, etc.). Let's suppose they were only talking about Joseph Stiglitz, who really is something of a dilettante these days. Or about bloggers like Yours Truly (who are actual dilettantes). Or about the INET folks. Even if so, this sort of dismissive snorting is still a bad look.

Why? Because declaring that outsiders are never qualified to criticize your field makes you look insular and arrogant. Every economist knows about regulatory capture. It's not much of a leap to think that researchers can be captured too -- that if the only people who are allowed to criticize X are people who make a living doing X, then all the potential critics will have a vested interest in preserving the status quo.

In other words, Christiano et al.'s essay looks like a demand for outsiders to shut up and keep mailing the checks.

Second of all, Christiano et al. give ammo to the "econ isn't a science" crowd by using the word "experiments" to refer to model simulations. Brad DeLong already wrote about this unfortunate terminology. Everyone knows that thought experiments aren't experiments, of course - Christiano et al. aren't actually confusing the two. But obstinately insisting on using this word just makes econ look like a pseudoscience to outside observers. It's bad marketing.

Third, Christiano et al. are just incorrect. Their defense of DSGE is, basically, that it's the only game in town - the only way to make quantitative predictions about the effects of policy changes.

That's wrong. There are at least two other approaches that are in common use - sVARs and SEMs. sVARs are often used for policy analysis in academic papers. SEMs are used by central banks to inform policy decisions. Both sVARs and SEMs claim to be structural. Lots of people laugh at those claims. But then again, lots of people laugh at DSGE too.

In fact, you don't always even need a structural model to make quantitative predictions about policy; often, you can do it in reduced form. When policy changes can be treated like natural experiments, their effects - including general equilibrium effects! - can be measured directly instead of inferred from a structural model.

As Justin Wolfers pointed out on Twitter, at least one of questions that Christiano, et al. claim is only answerable by DSGE simulations can actually be answered in reduced form:
Does an increase in unemployment benefits increase unemployment? On the one hand, conventional wisdom argues that higher benefits lead to higher wages and more unemployment. On the other hand, if the nominal interest rate is relatively insensitive to economic conditions, then the rise in wages raises inflation. The resulting decline in the real interest rate leads to higher aggregate demand, a rise in economic activity and lower unemployment. Which of these effects is stronger?
A 2015 paper by John Coglianese addresses this question without using a DSGE model:
I analyze a natural experiment created by a federal UI extension enacted in the United States during the Great Recession and measure the effect on state-level employment. I exploit a feature of this UI extension whereby random sampling error in a national survey altered the duration of unemployment insurance in several states, resulting in random variation in the number of weeks of unemployment insurance available at the state level. 
Christiano et al. totally ignore the existence of natural experiments. They claim that in the absence of laboratory experiments, model simulations are the best we've got. The rapidly rising popularity of the natural experiment approach in economics doesn't even register on their radar screens. That's not a good look.

Finally, Christiano et al. strike a tone of dismissive arrogance, at a time when the world (including the rest of the econ profession) is rightly calling for greater humility from macroeconomists. The most prominent, common DSGE models - the type created by Christiano and Eichenbaum themselves - failed pretty spectacularly in 2008-12. That's not a record to be arrogant about - it's something to apologize for. Now the profession has patched those models up, adding finance, a zero lower bound, nonlinearity, etc. It remains to be seen how well the new crop of models will do out of sample. Hopefully they'll do better.

But the burden of proof is on the DSGE-makers, not on the critics. Christiano et al. should look around and realize that people outside their small circle of the world aren't buying it. Central banks still use SEMs, human judgment, and lots of other tools. Finance industry people don't use DSGEs at all. Even in academia, use of DSGE models is probably trending downward:


In other words, Christiano et al. and other DSGE champions are still getting paid nice salaries to make DSGE models, but they're not winning the intellectual battle in the wider world. Dismissive rhetoric like this essay will not help their case. Even Miles Kimball, who spent his career making DSGE models, and who made crucial contributions to the models for which Christiano and Eichenbaum got famous, was put off by this essay. 

Look. There are good defenses of modern macro, and of DSGE, to be made. Ricardo Reis made a really good defense earlier this year. Fabio Ghironi made another good one. Their essays are humble and smart. They acknowledge the key importance of empirical evidence and of a diversity of approaches. They also acknowledge that macroeconomists need to do better, and that this will take some time. They focus on the young people doing good work, striving to improve things, and striking out in new directions. These are the macro defenses the profession needs.

The idea of DSGE models is not a bad one. Working on DSGE models isn't necessarily wasted effort. Nor are most DSGE modelers the dismissive, chest-thumping caricature that Christiano et al.'s essay paints them as. People are out there doing good work, trying to improve the performance of macro models. But rhetoric like this ends up hurting, rather than helping, their task. 

Tuesday, October 10, 2017

Defending Thaler from the guerrilla resistance


So, Richard Thaler won the Nobel Prize, which is pretty awesome. If you've read Thaler's memoir, you'll know that it was a long, hard, contentious fight for him to get his ideas accepted by the mainstream. And even though Thaler is now a Nobelist and has been the AEA president - i.e., he has completely convinced the commanding heights of the econ establishment that behavioral econ is a crucial addition to the canon - resistance still pops up with surprising frequency in certain corners of the econ world. It's a sort of ongoing guerrilla resistance.

An example is this blog post by Kevin Bryan of A Fine Theorem. Kevin is one of the best research-explainers in the econ blogosphere, and his Nobel explainer posts have always been uniformly excellent. This time, however, instead of explaining Thaler's research, Kevin decided to challenge it, in a rather dismissive manner. In fact, his criticisms are pretty classic anti-behavioral stuff - mostly the same arguments Thaler talks about in his memoir.

Anyway, let's go through some of these criticisms, and see why they don't really hit the mark.


1. The invisible hand-wave

First, a random weird thing. Kevin writes:
Much of my skepticism is similar to how Fama thinks about behavioral finance: “I’ve always said they are very good at describing how individual behavior departs from rationality. That branch of it has been incredibly useful. It’s the leap from there to what it implies about market pricing where the claims are not so well-documented in terms of empirical evidence.”
This is Fama, not Kevin, but it's a very odd quote. Behavioral finance has been very good at documenting asset price anomalies - in fact, this is almost all of what it's good at. This is what Shiller got the Nobel for in 2013, and it's what Thaler himself is most famous for within the finance field. Behavioral finance has struggled (though not entirely failed) to explain most of these anomalies in terms of psychology, especially in terms of insights drawn from experimental psychology. But in terms of empirical evidence, behavioral finance is pretty solid.

Anyway, that might be a sidetrack. Back to Kevin:
[S]urely most people are not that informed and not that rational much of the time, but repeated experience, market selection, and other aggregative factors mean that this irrationality may not matter much for the economy at large. 
This is a dismissal that Thaler refers to as "the invisible hand wave". It's basically a claim that markets have emergent properties that make a bunch of not-quite-rational agents behave like a group of complete-rational agents. The justifications typically given for this assumption - for example, the idea that irrational people will be competed out of the market - are typically vague and unsupported. In fact, it's not hard at all to write down a model where this doesn't happen - for example, the noise trader model of DeLong et al. But for some reason, some economists have very strong priors that nothing of this sort goes on in the real world, and that the emergent properties of markets approximate individual rationality.


2. Ethical concerns

Kevin, like many critics of Thalerian behavioral economics, raises ethical concerns about the practice of "nudging":
Let’s discuss ethics first. Simply arguing that organizations “must” make a choice (as Thaler and Sunstein do) is insufficient; we would not say a firm that defaults consumers into an autorenewal for a product they rarely renew when making an active choice is acting “neutrally”. Nudges can be used for “good” or “evil”. Worse, whether a nudge is good or evil depends on the planner’s evaluation of the agent’s “inner rational self”, as Infante and Sugden, among others, have noted many times. That is, claiming paternalism is “only a nudge” does not excuse the paternalist from the usual moral philosophic critiques!...Carroll et al have a very nice theoretical paper trying to untangle exactly what “better” means for behavioral agents, and exactly when the imprecision of nudges or defaults given our imperfect knowledge of individual’s heterogeneous preferences makes attempts at libertarian paternalism worse than laissez faire.
There are, indeed, very real problems with behavioral welfare economics. But the same is true of standard welfare economics. Should we treat utilities as cardinal, and sum them to get our welfare function, when analyzing a typical non-behavioral model? Should we sum the utilities nonlinearly? Should we consider only the worst-off individual in society, as John Rawls might have us do?

Those are nontrivial questions. And they apply to pretty much every economic policy question in existence. But for some reason, Kevin chooses to raise ethical concerns only for behavioral econ. Do we see Kevin worrying about whether efficient contracts will lead to inequality that's unacceptable from a welfare perspective? No. Kevin seems to be very very very worried about paternalism, and generally pretty cavalier about inequality.

Perhaps this reflects Kevin's libertarian values? I actually have no idea what Kevin believes in. But hopefully the Nobel committee tries to make its awards based on the positive rather than normative considerations. After all, the physics Nobel often goes to scientists whose discoveries could be used to make weapons, right? I just don't see the need to automatically mix in ethics and values when assessing the importance of behavioral economics.


3. The invisible hand-wave, again

Kevin writes:
Thaler has very convincingly shown that behavioral biases can affect real world behavior, and that understanding those biases means two policies which are identical from the perspective of a homo economicus model can have very different effects. But many economic situations involve players doing things repeatedly with feedback – where heuristics approximated by rationality evolve – or involve players who “perform poorly” being selected out of the game. For example, I can think of many simple nudges to get you or I to play better basketball. But when it comes to Michael Jordan, the first order effects are surely how well he takes cares of his health, the teammates he has around him, and so on. I can think of many heuristics useful for understanding how simply physics will operate, but I don’t think I can find many that would improve Einstein’s understanding of how the world works.
This argument makes little sense to me. Most people aren't Michael Jordan or Einstein. And those people surely didn't compete all the other basketball players and physicists out of the market. Why does the existence of a few perfectly rational people mean that nudges don't matter in aggregate? Also, why should we assume that non-Michael-Jordans can quickly or completely learn heuristics that make nudges unnecessary? If that were true, why would players even have coaches?

It seems like another case of the invisible hand wave.

(Also, when it's used as an object, it's "you and me", not "you and I". This grammar overcorrection is my one weakness. If you ever need to defeat me in battle, just use "X and I" as an object, and I'll fly into an insane rage and walk right into your perfectly executed jujitsu move.)

Kevin continues:
The 401k situation [that Thaler's most famous nudge policy deals with] is unusual because it is a decision with limited short-run feedback, taken by unsophisticated agents who will learn little even with experience. The natural alternative, of course, is to have agents outsource the difficult parts of the decision, to investment managers or the like. And these managers will make money by improving people’s earnings. No surprise that robo-advisors, index funds, and personal banking have all become more important as defined contribution plans have become more common! If we worry about behavioral biases, we ought worry especially about market imperfections that prevent the existence of designated agents who handle the difficult decisions for us.
Assuming that a market for third-party advice will take care of behavioral problems seems like both a big leap and a mistake. First, there's the assumption that someone with nontrivial behavioral biases will be completely rational in her choice of an adviser. Big assumption. Remember that people are typically paying financial advisers a fifth of their life's savings or more. Big price tag. How confident are we that someone who treats opt-in and opt-out pensions differently is going to get good value for that huge and opaque expenditure?

Also, suppose that financial advisers really do earn their keep, i.e. a fifth of your life's savings. If the market for financial advice is efficient, and financial advice is all about countering your own behavioral biases, that means that behavioral biases are so severe that their impact is worth a fifth of your lifetime wealth! If a cheap little nudge could make all of that vast expenditure unnecessary - i.e., if it could get you to do the thing that you'd otherwise pay a financial adviser 20% of your lifetime wealth to do for you - then the nudge seems like a huge efficiency-booster.

So this point of Kevin's also seems to miss the mark.


4. Endowment effects and money pumps

Kevin writes:
Consider Thaler’s famous endowment effect: how much you are willing to pay for, say, a coffee mug or a pen is much less than how much you would accept to have the coffee mug taken away from you. Indeed, it is not unusual in a study to find a ratio of three times or greater between the willingness to pay and willingness to accept amount. But, of course, if these were “preferences”, you could be money pumped (see Yaari, applying a theorem of de Finetti, on the mathematics of the pump). Say you value the mug at ten bucks when you own it and five bucks when you don’t. Do we really think I can regularly get you to pay twice as much by loaning you the mug for free for a month? Do we see car companies letting you take a month-long test drive of a $20,000 car then letting you keep the car only if you pay $40,000, with some consumers accepting? Surely not.
First of all, the endowment effect isn't a money pump if it only works once with each object. It's only a money pump if you can keep loaning and reselling something to someone. Otherwise, people's maximum potential losses from this bias are finite - they're just some percent of their lifetime consumption. Maybe not 300%, but something.

But anyway, Kevin says that we don't see car companies letting you take a month-long test drive. Hmm. I guess that is true...for cars.



5. External validity of lab effects

Everyone knows external validity of laboratory findings is a big problem for experimental economics (and psychology, and biology...). Also problematic is ecological validity - even if a lab effect consistently exists in the real world, it might not matter quantitatively compared to other stuff. External and ecological validity do present big challenges for behaviorists who want to take insights from the lab and use them to predict real-world outcomes.

But Kevin chooses some highly questionable examples to illustrate the problem. For example:
Even worse are the dictator games introduced in Thaler’s 1986 fairness paper. Students were asked, upon being given $20, whether they wanted to give an anonymous student half of their endowment or 10%. Many of the students gave half! This experiment has been repeated many, many times, with similar effects. Does this mean economists are naive to neglect the social preferences of humans? Of course not! People are endowed with money and gifts all the time. They essentially never give any of it to random strangers – I feel confident assuming you, the reader, have never been handed some bills on the sidewalk by an officeworker who just got a big bonus! Worse, the context of the experiment matters a ton (see John List on this point). Indeed, despite hundreds of lab experiments on dictator games, I feel far more confident predicting real world behavior following windfalls if we use a parsimonious homo economicus model than if we use the results of dictator games.
Does Kevin seriously think that any behaviorist believes that dictator games imply that people walk around giving away half of any gifts they receive? That makes no sense at all. In the dictator game, there's one other person - in the real world, there are effectively infinite other people. What would it even mean for a person on the street to behave analogously to a person in a dictator game? The situations aren't equivalent at all.

As John List says, context matters. Wage negotiations at a company are different than family gift exchanges, which are different from financial windfalls, which are different from randomly being handed money on the street. Norms in these situations are different. If someone gives you a gift, there's probably a norm of not re-gifting it. If someone hands you money in a dictator game, you probably don't treat it as a personal gift. Etc.

To me, this is clearly not a reason to assume that norms and values only matter in the lab, and that real-world people always behave perfectly selfishly. Quite the contrary. It's a reason to pay more attention to norms and values, not less. Why does Bill Gates give away so much of his money? Why do people give money to some beggars and buskers but not to others? Do these behaviors bear any similarity to how people behave when asking for (or handing out) raises in the workplace? Do they bear any similarity to the way people haggle over the price of a car or a house?

These are not trivial questions to be waved away, simply because if you hand someone cash on the street they don't instantly hand half of it to the first person they see.

Kevin follows this up with what seems like another bad example:
To take one final example, consider Thaler’s famous model of “mental accounting”. In many experiments, he shows people have “budgets” set aside for various tasks. I have my “gas budget” and adjust my driving when gas prices change. I only sell stocks when I am up overall on that stock since I want my “mental account” of that particular transaction to be positive. But how important is this in the aggregate? Take the Engel curve. Budget shares devoted to food fall with income. This is widely established historically and in the cross section. Where is the mental account? Farber (2008 AER) even challenges the canonical account of taxi drivers working just enough hours to make their targeted income. As in the dictator game and the endowment effect, there is a gap between what is real, psychologically, and what is consequential enough to be first-order in our economic understanding of the world.
Kevin's argument appears to be that if mental accounting only matters in some domains, it doesn't matter overall. That makes no sense to me. If mental accounting is important for investing and driving, but not for food purchases or taxi jobs, does that mean it's not important "in the aggregate"? Of course not! Gas is a substantial monthly expense. The compounded rate of return on your stock portfolio can make a huge difference to your lifetime consumption. Even if mental accounting mattered only for these two things, it would matter in the aggregate.


So, Kevin's attacks on Thaler's research paradigm pretty much uniformly miss the mark. Because of this, I half suspect that Kevin - usually the most careful and incisive of bloggers - is playing devil's advocate here, taking cheap shots at behaviorism simply because it's fun. This guerrilla resistance is more like paintball.

Wednesday, September 27, 2017

Handwaving on health care


There's a particular style of argument that some conservative economists use to dismiss calls for government intervention in markets:

Step 1: Either assert or assume that free markets work best in general.

Step 2: List the reasons why this particular market might be unusual.

Step 3: Dismiss each reason with a combination of skeptical harumphing, handwaving, anecdotes, and/or informal evidence.

Step 4: Conclude that this market should be free from government intervention.

In a recent rebuttal to a Greg Mankiw column on health care policy, John Cochrane displays this argumentation style in near-perfect form. It is a master class in harrumphing conservative prior-stating, delivered in the ancient traditional style. Young grasshoppers, take note.

Mankiw's article was basically a rundown of reasons that health care shouldn't be considered like a normal market. He covers externalities, adverse selection, incomplete information, unusually high idiosyncratic risk, and behavioral factors (overconsumption).

Cochrane makes a guess at the motivation of Mankiw's column:
I suspect I know what happened. It sounded like a good column idea, "I'll just run down the econ 101 list of potential problems with health care and insurance and do my job as an economic educator."
That sounds about right. In fact, that actually was the reason for my similar column in Bloomberg a few months ago. Frankly, I think bringing readers up to speed on Arrow's classic piece on health care is a pretty good idea for a column. Mankiw generally did a better job than I did, although he didn't mention norms, which I think are ultimately the most important piece of the puzzle (more on that later).

Anyway, Cochrane wrote a pretty unfair and over-the-top response to that Bloomberg post of mine, which also made a rather unintelligent pun using my first name (there's an extra syllable in there, dude!). His response to Mankiw has more meat to it and less dudgeon, but is still rather ascerbic. Cochrane writes:
I am surprised that Greg, usually a good free marketer, would stoop to the noblesse oblige, the cute little peasants are too dumb to know what's good for them argument... 
[I]s this a case of two year old with hammer?... 
I suspect I know what happened. It sounded like a good column idea, "I'll just run down the econ 101 list of potential problems with health care and insurance and do my job as an economic educator." If so, Greg failed his job of public intellectual... 
The last section of After the ACA goes through all these arguments and more, and is better written. I hope blog regulars will forgive the self-promotion, but if Greg hasn't read it, perhaps some of you haven't read it either.
Grumpy indeed!

So, Cochrane's post consists of him hand-waving away the notion that externalities, high idiosyncratic risk, and adverse selection might matter enough in health care markets to justify large-scale government intervention. 

To summarize Cochrane's points about externalities:

  • Health externalities affect only a small subset of the things that Obamacare deals with.
  • Lots of other markets have externalities. 

To summarize Cochane's point about high idiosyncratic risk:

  • That's what insurance markets are for, duh!

To summarize Cochrane's points about adverse selection:

  • Doctors know more about your health than you do.
  • Adverse selection assumes rational patients, while behavioral effects assume irrational patients.
  • The government forces insurers not to charge people different prices based on their health status.
  • Other insurance markets, like car insurance, function without breaking down due to adverse selection.
  • Services to mitigate adverse selection exist in other insurance markets.
  • Most health expenses are predictable, and thus not subject to adverse selection. 

So, to rebut these, I could go through each point one by one and do counter-hand-waving. For example:

  • The idea that doctors know more about your health than you do assumes that you've already bought health care and are already receiving examinations. Prior to buying, you know your health better. 
  • People can be irrational in some ways (or in some situations) and rational in others, obviously.
  • The fact that the government forces insurers to pay the same price is part of the policy that's intended to mitigate adverse selection, and therefore can't be used as proof that adverse selection doesn't exist in the absence of government intervention.
  • Markets might have different amounts of adverse selection. For example, insurers might be able to tell that I'm a bad driver, but not that I just found a potentially cancerous lump in my testicle.
  • Adverse selection mitigation services are socially costly, and Carmax for health care might work much worse than Carmax for cars.
...and so on.

But who would be right? It really comes down to your priors. Priors about how irrational people are. Priors about how much asymmetric information exists and how much it matters in various markets, Priors about how costly and feasible Carmax for health care would be. Priors about how reputational effects work in health care markets. Priors about how efficient government is at fixing market failures. And so on. Priors, priors, priors.

Reiteration of priors can get tiresome.

Instead, here is a novel idea: We could look at the evidence. Instead of thinking a priori about how important we think adverse selection is in health care markets, we could think "Hey, some smart and careful economist or ten has probably done serious, careful empirical work on this topic!" And then we could fire up Google Scholar and look for papers, or perhaps go ask a friend who works in applied microeconomics or the economics of health care. 

In his health care article, "After the ACA", Cochrane cites a wide variety of sources, including New Yorker and Wall Street Journal and New York Times and Washington Post articles, a JAMA article and a NEJM, some law articles, a number of blog posts, a JEP article and a JEL article, some conservative think tank reports, Akerlof's "Market for Lemons" article, the comments section of his own blog, and a YouTube video entitled "If Air Travel Worked Like Health Care". (This last one is particularly funny, given that Cochrane excoriated me for claiming that he compared the health insurance industry to the food industry. As if he would ever imply such a thing!)

As silly as a couple of these sources may be, overall this is a fine list - it's good to cite and to have read a breadth of sources, especially on an issue as complex and multifaceted as health care. I certainly cannot claim to have read anywhere nearly as deeply on the subject. 

But as far as I can see, Cochrane does not engage with the empirical literature on adverse selection in health insurance markets. He may have read it, but he does not cite it or engage with it in this blog post, or in his his "After the ACA" piece, or anywhere I can find.

This is a shame, because when he bothers to read the literature, Cochrane is quite formidable. When he engaged with Robert Shiller's evidence on excess volatility in financial markets, and when he engaged with New Keynesian theory, Cochrane taught us new and interesting things about both of these issues. In both of these cases, Cochrane approached the issue from a perspective of free-market orthodoxy, and advanced the free-market (or efficient-market) case like a lawyer. But in both cases, he did so in a brilliant way that respected his opponents' arguments and evidence, and ultimately yielded new insight. 

But in the case of adverse selection in health insurance, Cochrane does not engage with the literature. And although I haven't read much of that literature, I know it exists, because I've read this 2000 literature review by David Cutler and Richard Zeckhauser. Starting on page 606, Cutler and Zeckhauser first present the basic theory of adverse selection, and then proceed to discuss a large number of studies that use a large and diverse array of techniques to measure the presence of adverse selection in health insurance. They write:
A substantial literature has examined adverse selection in insurance markets. Table 9 summarizes this literature, breaking selection into three categories: traditional insurance versus managed care; overall levels of insurance coverage; and high versus low option coverage.  
Most empirical work on adverse selection involves data from employers who allow choices of different health insurance plans of varying generosity; a minority of studies look at the Medicare market, where choices are also given. Within these contexts, adverse selection can be quantified in a variety of fashions. Some authors report the difference in premiums or claims generated by adverse selection after controlling for other relevant factors [for example, Price and Mays (1985). Brown at al. (1993)]. Other papers examine the likelihood of enrollment in a generous plan conditional on expected health status [for example, Cutler and Reber (1998)]. A third group measure the predominance of known risk factors among enrollees of more generous health plans compared to those in less generous plans [for example, Ellis (1989)].  
Regardless of the exact measurement strategy, however, the data nearly uniformly suggest that adverse selection is quantitatively large. Adverse selection is present in the choice between fee-for-service and managed care plans (8 out of 12 studies, with 2 findings of favorable selection and 3 studies ambiguous), in the choice between being insured and being uninsured (3 out of 4 studies, with 1 ambiguous finding), and in the choice between high-option and low-option plans within a given type (14 out of 14 studies). 
They proceed to list the studies in a table, along with brief summaries of the methods and the results.

Have I read any of these studies? In fact, I have read only one of them - a 1998 study of some government and university employees, also by Cutler and Zeckhauser. They document a market breakdown - the disappearance of high-coverage health plans. And they present evidence that this breakdown was due to the so-called "adverse selection death spiral", in which healthy people leave high-coverage plans until the plans can no longer be offered. And they show that a similar thing was starting to happen to the Group Insurance Commission of Massachussetts, before major reforms were made to the system that prevented the death spiral. 

So there is some evidence that adverse selection not only exists and creates costs in (at least some!) health insurance markets, but is so severe that it can cause market breakdown of the classic Akerlofian type. 

If I were setting out to dismiss the possibility of this sort of major adverse selection, I would read a number of these papers, or at least skim their results. I would also look for more recent work on the subject. 

I would also read the literature on adverse selection in other insurance markets, to see whether there's a noticeable difference between types of insurance. I'd read this Chiappori and Salanie paper on auto insurance, for example (which I had to study in grad school), which finds no evidence of adverse selection in car insurance. That would make me think "Hmm, maybe car insurance and health insurance are two very different markets."

I am not setting out to dismiss adverse selection, however. Nor am I setting out to claim that it's a big enough problem that it requires major government regulation of the health insurance market. Nor am I claiming that Obamacare passes a cost-benefit test as a remedy for adverse selection. In fact, I don't even think that adverse selection is the main reason we regulate health care! I think it's kind of a sideshow - an annoyance that we have to deal with, but not the central issue. I think the central issue of health care regulation is just a social norm - the widespread belief that everyone ought to have health care, and that the cost of health care ought to depend only on your ability to pay. Those norms, I believe, are why people embrace universal health care, and why they are now coming to embrace the radical solution of single-payer health care.

But that's just me. Cochrane thinks adverse selection is the big issue, so he goes after it, but without standing on the shoulders of the giants who have investigated the matter before. Instead, he waves the problem away. Unlike me, who am but a lowly journalist, Cochrane is a celebrated professional economist. He has done much better in the past, and he could do better now if he chose.