Tuesday, July 28, 2015

Big Data vs. Big Gladwell


Here is a news article about a Malcolm Gladwell speech. This news article is of great interest to me, since it suggests that it's not actually very hard to build a lucrative career going around and giving knowledgeable-sounding speeches about concepts, technologies, or companies that are in the news. I could do that job. Dear readers, you know I could do that job. 

A more minor reason that this article is of interest to me is that it gives me a chance to do a snarky point-by-point refutation, which is something I have to do periodically or else go (more) insane. So let's go through and count some of the silly things that Malcolm Gladwell is quoted in this article as having said.
Last night futurist, journalist, prognosticator, and author Malcolm Gladwell told pretty much the most data-driven marketing technologist crowd imaginable that data is not their salvation. 
In fact, it could be their curse.
Bump-bump-BUMMMMM!!!


So how is data our curse?
“More data increases our confidence, not our accuracy,” [Gladwell] said at mobile marketing analytics provider Tune’s Postback 2015 event in Seattle. “I want to puncture marketers’ confidence and show you where data can’t help us.”
Except sometimes more data does increase our accuracy. For example, you can have an estimator that is asymptotically unbiased, but biased in small samples. So Gladwell is totally wrong about that.

Next, Gladwell tells us about the "Snapchat Problem":
The average person under 25 is texting more each day than the average person over 55 texts each year, Gladwell says. That’s what the data can tell us. 
What it can’t tell us is why. 
“The data can’t tell us the nature of the behavior,” Gladwell said. “Maybe it’s developmental … or maybe it’s generational.”
Well that particular piece of data won't tell you, but maybe others could. For example, you could use regional/national variation in the time that countries got smartphone service, and compare Snapchat uptake among age-matched cohorts. 

Of course, that is a different piece of data than the one Gladwell cited. Does Gladwell think it is a significant, penetrating insight to point out that for different questions, you may need different data sets? When Gladwell calls data a "curse", is he using the word "curse" to mean "something that you might need more than one of in order to be omniscient"?

Anyway:
Developmental change, in Gladwell’s story, is behavior that occurs as people age...Generational change, on the other hand, is different. That’s behavior that belongs to a generation, a cohort that grows up and continues the behavior...The question is whether Snapchat-style behavior is developmental or behavioral. 
“In the answer to that question is the answer to whether Snapchat will be around in 10 years,” Gladwell said.
No, that will most certainly not tell us whether Snapchat will be around in 10 years. For example, suppose Snapchat is "developmental", so that young people like it more than old people. Well, there is a constant new supply of young people. But suppose instead that Snapchat is "generational", so that people who grow up with it like it. Well, why wouldn't new generations like growing up with it just as much as old generations did? So even if we answer Gladwell's question, it does not, in fact, tell us much about the future of Snapchat.

Next, Gladwell tells us about the "Facebook Problem":
“Facebook is at the stage that the telephone was at when they thought the phone was not for gossiping — it’s in its infancy,” Gladwell said... 
The diffusion of new technologies always takes longer than we would assume, Gladwell said. The first telephone exchange was launched in 1878, but only took off in the 1920s. The VCR was created in the 1960s in England, but didn’t reach its tipping point until the 1980s... 
Technologies that are both innovative and complicated, like Facebook, take even longer to really emerge.
Except that this doesn't apply to Facebook, because everyone already uses Facebook. Yes, there was a period in time when social networks - Friendster, Myspace - were not widely used. That era is now in the past. People may find new ways to use Facebook, but it's not in its infancy - it has already experienced near-universal uptake. Discussing when Facebook might "really emerge" is like discussing when television might "really emerge".

Finally, Gladwell tells us about the "Airbnb Problem":
The sharing economy, featuring companies like Airbnb, Uber/Lyft, even eBay, rely on trust... 
And yet, if you look at recent polls of trust and trustworthiness, people’s — and especially millennials — trust is at an all-time low. Out of ten American “institutions,” including church, Congress, the presidency, and others, millennials only trust two: the military and science... 
That’s conflicting data. And what the data can’t tell us is how both can be true, Gladwell said...“So which is right? Do people not trust others, as the polls say … or are they lying to the surveys?”
So is it a contradiction if people trust the clocks on their cell phones but distrust Vladimir Putin? Is it a contradiction if people trust their neighbors but distrust the mafia? Are data contradictory whenever they show differing levels of aggregate trust in different people, institutions, or objects? And in general, why should trust in institutions be correlated with trust in other individuals?

What really startles me is that people trust Malcolm Gladwell to say useful things at marketing conferences. 

Anyway, generating such jaw-dropping nonsense must get tiring, so Gladwell falls back on some good old tried-and-true incorrect facts:
[Gladwell said there has been] a massive shift in American society over the past few decades: a huge reduction in violent crime. For example, New York City had over 2,000 murders in 1990. Last year it was 300. In the same time frame, the overall violent crime index has gone down from 2,500 per 100,000 people to 500. 
“That means that there is an entire generation of people growing up today not just with Internet and mobile phones … but also growing up who have never known on a personal, visceral level what crime is,” Gladwell said. 
Baby boomers, who had very personal experiences of crime, were given powerful evidence that they should not trust.
Except here is a chart of U.S. homicide rates:


You'll see that when Baby Boomers were young (under 20), there was even less homicide (and other crime) than when Millennials were under 20. Oops.

Also, Gladwell's statement that young people don't know "what crime is" ignores the fact that U.S. crime rates are still many times what they are in other countries. It's just an obviously false statement.

Also, just to be complete I should note that if Gladwell were right, regions that experienced much less of a crime spike in the 70s and 80s should have higher Airbnb use among Baby Boomers. But I think we've seen very high uptake in, say, Northern California and the Pacific Northwest, where the crime boom was much less severe. However, rigorous analysis (with yes...gasp...DATA!) would be able to answer this question more definitely.

Folks, there are many important cautions to be made about the use of Big Data. These are not they.

And now, finally, just for fun, we have the Coup de Gladwell:
“I think millennials are very trusting,” Gladwell said. “And when they say they’re not...they’re bullshitting.”
And there you have it, folks. Who needs data when you have Gladwellian Pronouncements. The future is not the era of Big Data...it is the era of Big Gladwell. 

Now if only we could put Gladwell's insight in an app and sell it...

47 comments:

  1. "People may find new ways to use Facebook, but it's not in its infancy - it has already experienced near-universal uptake. Discussing when Facebook might "really emerge" is like discussing when television might "really emerge"."
    -Yup. Facebook has been established technology for more than half a decade.
    “I think millennials are very trusting,” Gladwell said. “And when they say they’re not...they’re bullshitting.”
    -What about Albanians, Mexicans, and other people living in low-trust societies.
    "For example, suppose Snapchat is "developmental", so that young people like it more than old people. Well, there is a constant new supply of young people. But suppose instead that Snapchat is "generational", so that people who grow up with it like it. Well, why wouldn't new generations like growing up with it just as much as old generations did? So even if we answer Gladwell's question, it does not, in fact, tell us much about the future of Snapchat."
    -Zing.
    "When Gladwell calls data a "curse", is he using the word "curse" to mean "something that you might need more than one of in order to be omniscient"?"
    -Another excellent zinger.

    What sort of speech would you give to replace Gladwell's, Noah?

    ReplyDelete
    Replies
    1. Forgot question mark after "societies".

      Delete
    2. Anonymous5:45 PM

      You're weird

      Delete
  2. Anonymous1:49 AM

    If he wants to be a Big Data contrarian, he can make the correct, but popularly unappreciated, observation that a query does not an inference make, even when N is large. A lot of the people celebrating Big Data under the name "Big Data" present descriptive statistics and data visualization without performing the appropriate statistical analysis. Big Data often requires more complex methods of analysis, since the data generating process is poorly modeled as IID sampling, data sets are high-dimensional, the signal-to-noise ratio is low, etc. People with training in statistics and related fields, as well as people whose job it is to work with Big Data, generally understand and try to address these points in their work, but these points are rarely emphasized in popular discussions. Gladwell could make these points in his non-technical style using nice anecdotes, and then he'd seem a contrarian in public, while experts would approve. Instead he seems to have just phoned this one in.

    ReplyDelete
    Replies
    1. Anonymous4:27 PM

      "Big Data often requires more complex methods of analysis, since the data generating process is poorly modeled as IID sampling, data sets are high-dimensional, the signal-to-noise ratio is low, etc. People with training in statistics and related fields, as well as people whose job it is to work with Big Data, generally understand and try to address these points in their work, but these points are rarely emphasized in popular discussions. "

      Non-IID and low signal to noise data are far from new; statisticians and others have been dealing that for well over a century. The dimensionality and granularity have exploded.

      -Barry

      Delete
    2. Anonymous1:25 AM

      This is just to say that Big Data doesn't represent anything fundamentally new. Which is correct, except that it has changed priorities a lot. Certain sorts of data more commonly feature certain sorts of issues, and so the explosion of certain kinds of data has led to renewed attention to only certain issues at the expense of others. Yet many of the popularizations neglect this, as if the only problem with Small Data was that it was Small.

      Delete
  3. Anonymous6:50 AM

    You'll see that when Baby Boomers were young (under 20), there was even less homicide (and other crime) than when Millennials were under 20. Oops.

    Baby boomers are defined as those born between 1946 and 1964, so lots of them experienced ultra high crime rates in their youth. Oops.

    Also, Gladwell's statement that young people don't know "what crime is" ignores the fact that U.S. crime rates are still many times what they are in other countries. It's just an obviously false statement.

    Not really true. Except for homicide, the US now has less property and violent crime than Europe. See here. Mass incarceration in the US has paid off.

    ReplyDelete
    Replies
    1. >>Except for homicide, the US now has less property and violent crime

      Other than that, Mrs. Lincoln, how was the play?

      Delete
    2. Anonymous7:16 AM

      Homicide is a rare crime. Other, more common types of crime have a much bigger effect on how safe people perceive their society to be.

      Delete
    3. MaxUtil7:41 AM

      I'd really like to see the definition of "Europe" here. I'm guessing that adding or removing a few specific European countries in these stats would radically change the findings.

      Delete
    4. MaxUtil8:02 AM

      "Mass incarceration in the US has paid off"
      Mass incarceration might explain the large drop in crime, however it doesn't explain the large previous rise. We didn't stop jailing people in the 60's & 70's.

      Most common explanations of the steep crime drop (abortion, community policing, etc.) also fail to explain the sudden rise. Interesting, lead exposure is one theory that does fit both the rise and fall and also has good data supporting it from other countries.

      Delete
  4. MaxUtil7:44 AM

    Gladwell is the thinking man's Tom Friedman. More subtle and complex, yet ultimately flirting with similar vacuousness.

    ReplyDelete
    Replies
    1. Gladwell was the thinking man's Friedman--a smart communicator who could digest a lot of expert opinion and academic literature and express it in terms lay people could understand. Since he got success with Tipping Point and Blink, Gladwell isn't the thinking man's anything. He's all about his brand now. It's made his articles practically unreadable, since he doesn't just write about interesting stuff anymore: every article reads like a book proposal for yet another book about how truths we take for granted are actually wrong.

      The result is that Gladwell now spends his time hurling vast amounts of "big idea" declarations at the wall, in the hope that something appropriately Gladwellian sticks. "Facebook is in its infancy!" "Steve Jobs wasn't an innovator, he was a tinkerer!" "Millennials are trusting, despite the masses of data that claim they aren't!"

      Delete
    2. Actually, Tom Friedman may have more solid substance to his cogitations than does Gladwell, even if Gladwell supposedly deals with more esoteric ideas.

      Delete
    3. "even if"??

      The fact that Friedman has "more solid substance to his cogitations" is why so many of us find him even more unbearable than Gladwell. Someone who farts in the parlor is bad enough; what you really don't want is "solid substance"...

      Delete
  5. Phil Koop8:00 AM

    Dear readers, you know I could do that job.

    Sure you could, but then you'd have to snarkily refute yourself, or else go (more) insane.

    ReplyDelete
  6. Noah = The Fox
    Gladwell's $50-80k speaking fees = Sour Grapes

    You'll get there Noah. Give it time. Then some wise-ass blogger will be ripping you apart on their way up the ladder.

    ReplyDelete
  7. "So Gladwell is totally wrong about that."

    Um, no, per your own critique, he made a true generalization that is wrong in one specific instance. "So Gladwell is generally correct but missed an important exception" seems more like it.

    ReplyDelete
    Replies
    1. What is a "true generalization"? Gladwell said there is something that you can't do. But often - usually, in fact, in the econometrics world - you *can* do it.

      Delete
  8. Anonymous10:33 AM

    I think a lot of us have fantasized about becoming a Gladwell/Friedman type vacuous talking head, whether it's for money to burn on hedonistic frivolity or to to dramatically blow it all by dropping the mask and saying truly radical and dangerous things when nobody expects it. But I suspect it requires a certain sort of -- and I use the word loosely -- "talent" to achieve the information density equivalent of aerogel.

    ReplyDelete
    Replies
    1. I definitely think I could muster the requisite pompous vacuousness. Maybe I'm just being egotistical here...

      Delete
    2. Well the ego is a good start. Ego provides the impudence to pass off vacuousness as profundity.

      Delete
  9. "Facebook in its infancy." --> I think you're not giving enough credit to Gladwell here. Facebook is still a growth company, adding new features all the time, e.g. just in the last year it made it possible to start an ecommerce store on Facebook. So it may still be even more deeply integrated into our lives.

    I don't know anything about the early adoption of TVs, but didn't it take a while for programming to mature? Like, I wonder if it took a while for networks to develop sitcoms and cop shows and drop blatant native advertising even after everyone had a TV. Or, early movies were heavily influenced by theatre performances and it took a while to grow out into its own format even after many people started going to them. I'm not sure if these analogies apply, but I do think it's possible that we're not Facebooking properly yet.

    Time to disclose I'm not on Facebook. I acknowledge that I may totally just be talking out of my ass.

    ReplyDelete
    Replies
    1. Gladwell used an example that telephone invented in 1878 took off in the 1920s. If this is the infancy->maturity scale then Facebook may be dying of the old age already.

      Also TV introduces "new" features even now such as Smart TV enabling you to operate it by voice commands or watch programs on curved TV screens. But noone can argue that TV is in its infancy just because of that.

      Delete
  10. No need to get excited. Gladwell (like majority of academic economists :) is an entertainer. He gets paid to talk.

    ReplyDelete
    Replies
    1. Gladwell is a journalist, he gets paid to bring research from academia to the public's attention. The fact that he packages it in an entertaining format is just the best way to pique peoples interest.

      Delete
    2. What research did he bring forth in this speech?

      Delete
  11. Aw! What happened to the "Big Data v. Big Hair" title?

    ReplyDelete
    Replies
    1. I realized that he no longer has big hair!

      Delete
  12. Small thing, America's youth does not face much higher crime than in Europe. White/Asian neighborhoods have the homicide rate only 2 times higher than Western Europe (despite all the guns), and assault rates are much lower.

    ReplyDelete
  13. Such a classic write up of "that's art??!?! I could have done that!" But you didn't.

    ReplyDelete
  14. Isn't Gladwell just basically arguing that data without theory has its problems, quite a widely held view.

    I happen to agree, Einstein didn't gain his insights from studying reams of data, but from working from first principles.

    ReplyDelete
    Replies
    1. But some theory only develops after studying the data. Of course data without theory has problems but that does mean big data can't be used to develop good theory.

      Delete
  15. Doc at the Radar Station9:10 PM

    Holy smokes, what happened in 1904?

    ReplyDelete
    Replies
    1. Probably improvements in data collection.

      Delete
  16. Anonymous11:42 PM

    In which Noah Smith trolls famous people in a rather transparent attempt to get them to comment on his blog.

    ReplyDelete
    Replies
    1. It's happened more than once hasn't it? Hmmm

      Delete
  17. Noah! I am touched that I finally made it onto your radar screen! But you realize (I hope) that you weren't responding to my speech, right? I spoke for 45 minutes and gave multiple examples and qualifications, and what you read was a much abbreviated and not entirely accurate version of it. For example, the speech had nothing to do with Big Data. I never even mentioned the term. I started with Stuart Oskamp's famous study just as a way to introduce the topic of the talk, which is the need for data to have adequate theoretical scaffolding. The generational/development paradigm was just a way to make (the obvious) point that the same behavior can be interpreted a number of different ways depending on the theoretical model one uses. The telephone/Facebook example wasn't about the diffusion of an innovation, as you suggest. It was about how long it takes for us to figure out what a complex innovation is best used for AFTER it diffuses. The Airbnb example (and the gay marriage example that went with it) were about the way our understanding of data is enhanced by considering the broader social context in which that behavior is expressed. None of these are particularly counterintuitive notions--and none of the arguments, I'm afraid, bear much resemblance to the way that you characterized them. All of which is a useful reminder of the importance of going directly to the source (a lesson that I also violate all too often). My email, by the way, is gladwell@gmail. In the future, anytime you'd like to know more about anything I've said or written, I'd be happy to chat. I'm a fan.

    ReplyDelete
    Replies
    1. Haha nooooooo, you're going to make me feel super guilty about snarking!!

      OK, I'll email. :-)

      Delete
    2. Excellent response.

      Delete
    3. BTW, Malcolm, do you have a link to a full transcript of the speech?

      Delete
    4. Anonymous5:38 PM

      Going by the first paragraph and last sentence of Noah's piece, it is clear that Noah is contemplating his career options post the demise of RE. (Sorry, can't help myself dragging this in.)

      Whatever happened to "I shall save this planet, never fear."? :-)


      Henry

      Delete
    5. Oh man, that's great! I looked today specifically to see if MG responded.

      Delete
  18. Gladwell is great in a "as good coming up as going down" way. Take a subject that you know nothing about and Gladwell knows little about and read what he thinks. His writing makes for wonderfully engaging edutainment. Then go learn about the subject. Now Gladwell entertains you again in a Dan Brown way.

    ReplyDelete
  19. Maybe I'm missing the terms here, but in relation to Snapchat, I think the question isn't really whether young people use it because a) they are young, or b) they are part of Gen X (which is currently young). I think the real question being asked is more whether young people use it more because a) it appeals to a youthful outlook b) they were acclimated when young. The impacts of these two are quite different, in the a) keeps a mostly constant user base b) predicts a progressively growing user base until all generations are using it simultaneously. I don't think the data to answer this is available yet because it's basically in the future.

    ReplyDelete