Noahpinion: In the MISO soup

Saturday, October 17, 2015

In the MISO soup

Robin Hanson declares that thanks to Big Data, we will soon discover the SUPER FACTORS that drive all of human differences:

In a factor analysis, one takes a large high-dimensional dataset and finds a low dimensional set of variables that can explain as much as possible of the total variation in that dataset. A big advantage of factor analysis is that it doesn’t require much theoretical knowledge about the nature of the variables in the data or their relations – factors are mostly determined directly by the data...

[P]eople vary in far more ways than intelligence, ideology, and personality, and factor analyses have been applied to many of these other human feature categories. For example, there have been factors analyses of jobs, brands, faces, body shape, gait, accent, diet, leisure behavior, friendship networks, physical health, mortality, demography, national cultures, and zip codes.

[F]actors found in different feature categories are often substantially correlated with one another. This suggests that if we put together a huge super-dataset describing many individual people in as many ways as possible, a factor analysis of this dataset may find important new super-factors that span many of these features domains. Such super-factors would be promising candidates to use in a wide range of social research, and social policy...

I’d guess that the super-factors found in a super dataset of human details will instead be revolutionary. We will afterward see uncovering them as a seminal milestone in our progress in understanding human variation. A Nobel prize worthy level of seminality, or more. All it will take is lots of tedious work to collect a super dataset, and then do some straightforward number crunching.

Here's an object lesson in the perils of analyzing data without theory to guide you! Yes, it's easy to do a principal component analysis on a multidimensional data set and find some relatively small set of "factors" that "explain" most of the data. If we do what Robin says and throw everything we know about human characteristics into one massive data set and hit the PCA button, the STATA of the future will pop out our "super-factors" in short order.

One of the biggest super-factors will be income.

See, factor analysis doesn't tell you whether the factors cause all the other stuff, or are effects of the other stuff. In the world, there can be effects with multiple causes, and causes with multiple effects. In signals theory (a very different kind of signaling than the kind Robin is used to thinking about!), this might be called Multiple-Input-Single-Output and Single-Input-Multiple Output, or MISO and SIMO.

An example of SIMO would be anxiety disorder. A penchant for severe anxiety is going to affect your working life, your interpersonal life, your hobbies, etc. in statistically predictable ways. One cause, many effects.

An example of MISO would be income. Our marvelous market economy allows people to make money using a dizzying myriad of talents, skills, and resources. Some people make money by hitting a ball with a stick and running around a field. Some people make money by making big macro bets in financial markets, getting the first one right by luck, and then taking in billions of dollars in management fees. Some people make money by being friends with the right politician. Some people make money by inventing new kinds of semiconductors. And so on, and so on. One effect, many causes.

Since money can buy a ton of stuff, everyone wants money. And since money can buy a ton of stuff, almost anything valuable can be sold for money. So if income is among the set of characteristics in Robin's ultimate data set, it will undoubtedly emerge as one of the most important factors.

You can already see evidence of this in the media. Barely a day goes by without an announcement by Quartz or the Huffington Post that income differences predict differences in...you name it. School success, romance, self-confidence, frequency of weird eyebrow twitches. The assumption, of course, is that wealth privileges people in innumerable ways - i.e., that income is a SIMO kind of thing. But whether that's true, it's also likely true that income is a MISO kind of thing, where almost any positive or desirable trait can be leveraged - or is correlated with something that can be leveraged - to produce income. That, really, is why income is going to be correlated with almost any desirable human trait, no matter how little "privilege" remains in society.

So Robin's "super-factors" are quite possibly going to be very mundane things. MISO processes will cause a few desirable goals to be highly correlated with a large number of human traits that are useful in obtaining those goals.

Interesting, but hardly worthy of a Nobel. And a reminder that pure statistical analysis, without explicit theory to guide it, will be guided by implicit, simplistic theories.

P.S. - One thing Robin wrote that I didn't understand was the following:

As many people know, intelligence is the main factor explaining variation in cognitive test performance, ideology is the main factor explaining variations in political positions, and personality types explain much of the variation in stable attitudes and temperament.

Aren't these basically just labels? "Intelligence" is our word for cognitive test performance. "Personality type" is our word for stable attitudes and temperament. Seems to me that simply isolating a principal component and labeling it is a far cry from actually understanding what you're looking at.

19 comments:

pithom1:12 AM
And this is why I sometimes accept that Noah Smith can be sometimes a good writer and economist.
"The assumption, of course, is that wealth privileges people in innumerable ways - i.e., that income is a SIMO kind of thing. But whether that's true, it's also likely true that income is a MISO kind of thing, where almost any positive or desirable trait can be leveraged - or is correlated with something that can be leveraged - to produce income. That, really, is why income is going to be correlated with almost any desirable human trait, no matter how little "privilege" remains in society."
-This is truly the heart of the post and it's awesome. You won't hear that from your typical critic of The Bell Curve voluntarily.

"Aren't these basically just labels? "Intelligence" is our word for cognitive test performance. "Personality type" is our word for stable attitudes and temperament. Seems to me that simply isolating a principle component and labeling it is a far cry from actually understanding what you're looking at."

-No, not just that, Noah. General intelligence is about serial correlations between multiple types of test scores (e.g., math scores and reading scores). Ideology is about serial correlations between support for dozens of individual political positions (e.g., support for full abortion rights and support for confiscatory taxes on the rich). And personality type is about correlations between multiple personality traits in individuals (these I don't know much about, because I haven't researched them). It's about the correlations. Although these could, as you've pointed out, be either MISO or SIMO, I suspect they're most likely to be the latter. There's no good MISO reason I can see being good at playing ball has any correlation with playing in the stock market (and it probably doesn't). Likewise, there's no good MISO reason I can see being good at math has any correlation with being good at reading (but these do correlate!). And no good MISO reason I can see that being in favor of unrestricted abortion has anything to do with support for confiscatory taxes on the rich (I'm guessing these do correlate).
ReplyDelete
Replies
Nathan Taylor12:35 PM
I took Robin's post to be alluding in particular to collecting massive amounts of biological data. Think fitbit on steroids. Or super-augmented-Apple Watch. Something to track everything inside and outside your body in your daily life: heartbeat, gut microbiome, muscle reactions, eye responsiveness, blood chemistry, recording and transcribing all verbal communications, everything you see and do, etc.

This is an unexplored data set. Not just stuff we already collect like your test scores or income.

Your point is solid about PCA, we can name a component without understanding it's underlying biology and mechanism. But.....if this new massive biological data set finds really interesting things, such as (deliberately picking randomly) eye saccades and particular gut microbiomes correlate with voting Obama, well then we take that correlation and run with it. The PCA by itself doesn't make us understand, but finding correlations from new large data sets may head us down a very novel path about what makes people tick. Super factors as an impetus to investigation rather than understanding per se. By analogy, investigating why IQ tests are predictive of life outcomes. Yes, we don't understand underlying biology now, but this correlation is pointing some people down particular paths of investigation.
ReplyDelete
Replies
Anonymous1:28 PM
Some statistical notes: PCA is not the same as FA. PCA just finds a linear combination of your variables to return a new coordinate space in which the new variables / coordinate axes are orthogonal and ordering in decreasing order of variance. There is no causal assumption. PCA is not 'for' variable reduction, although it is often used for that. FA is often similar in its output, but is not identical. It is based on the assumption that the underlying latent factors cause the manifest variables (which are a mix of the factors and measurement error). You should not use them interchangeably. EFA is never truly theoretically neutral (despite people treating it that way), because choosing to enter a variable into an EFA analysis is an implicitly theoretical decision.

There also seems to be some confusion between doing an EFA with a lot of variables and hierarchical EFA.

For what it's worth, the idea of doing EFA with a lot of variables isn't that novel. In psychometrics (where EFA comes from), that research scheme lies behind the discovery of the "big five" trait theory in personality, and Spearman's g in intelligence.
ReplyDelete
Replies
Anonymous3:56 PM
Claims to be a sci fi nerd. Shits all over a guy basically trying to smuggle in psychohistory into economics. But hey, you got the valuable endorsement of 'biological realist' pithorn so thats something, warming up for your first zerohedge column I guess. Remember, if every third word isnt in bold, you cant stop the gold hating bankers and their zionist-reptilian allies.
ReplyDelete
Replies

Add comment