Noahpinion: Lars Peter Hansen explained. Kind of.

The entire econo-blogosphere has its usual pieces up explaining the work of two of this year’s Nobel(ish) laureates in economics, Gene Fama and Bob Shiller. Most of them just handwave when it comes to Lars Peter Hansen’s contributions.

Disclaimer: Skip this post if you already know all about GMM and can spout out the Hansen (1982) results without even looking. Also, I am not an econometrician and I learned this stuff years ago. Substantive corrections and amplifications are very welcome in the comments. I will try to not to assume that the reader knows everything in finance and econometrics except GMM. I will fail.

(Haters: Yes, yes, I’m sure Noah would never do an entire post just on the incredibly banal concept of GMM. Too bad, he is away this fall. Except of course when he’s not. Also to haters: My putting out crappy posts is an incentive for him to come back sooner.)

Generalized Method of Moments, or GMM, is a method for estimating statistical models. It allows you to write down models, estimate parameters, and test hypotheses (restrictions) on those models. It can also provide an overarching framework for econometrics. Fumio Hayashi’s textbook, which old timers shake their head at, uses GMM as the organizing framework of his treatment, and derives many classical results as special cases of GMM.

Since the claim is that GMM is particularly useful to financial data, let’s motivate with a financial model based on Hansen and Singleton (1983). It may seem preposterously unrealistic, but this is what asset pricing folks do, and you can always consider it a starting point for better models, as we do with Modigliani-Miller. Suppose the economy consists of a single, infinitely-lived representative consumer, whose von Neumann-Morgenstern utility function exhibits constant relative risk aversion, \begin{equation} U(c_t) = \frac{c_t^\gamma} \gamma, \quad \gamma < 1, \end{equation} where \(c_t\) is consumption in period t and \(\gamma\) is the coefficient of relative risk aversion. She maximizes her expected utility \begin{equation} E_0 \left[ \sum_{t=0}^\infty \beta^t U(c_t) \right], \quad 0 < \beta < 1, \end{equation} where \(E_0\) means expectation with information available at the beginning of the problem and \(\beta\) is a discount factor to represent pure time preference. This utility function implies that the representative consumer prefers more consumption, other things being equal, with more weight on consumption that happens sooner rather than later, but also wants to avoid a future consumption path that is too risky. She will invest in risky, assets, but exactly how risky?

If we introduce multiple assets, then try to solve this model by differentiating the expected utility function and setting it equal to zero, we get a first order condition \begin{equation} \label{returns-moment} E_t \left[ \beta \left( \frac{c_{t+1}}{c_t} \right) ^\alpha r_{i,t+1} \right] = 1, \quad i = 1, \ldots, N, \end{equation} where \( \alpha \equiv \gamma - 1 \) and \(r_{i,t+1}\) is the return on asset i from time t to time t+1. This approach is the basis of the entire edifice of “consumption based asset pricing” and it provides a theory for asset returns: they should be related to consumption growth, and in particular, assets that are highly correlated with consumption growth (have a high “consumption beta”) should have higher returns because they provide less insurance against consumption risk.

Equation \( \eqref{returns-moment} \) contains some variables such as \( c_t \) and \( r_{i,t+1} \) that we should hopefully be able to read in the data. It also contains parameters \( \beta \) and \( \alpha \) (or, of you prefer, \( \gamma \)), that we would like to estimate, and then judge whether the estimates are realistic. We would also like to test whether \( \eqref{returns-moment} \) provides a good description of the consumption and returns data, or in other words, whether this is a good model.

The traditional organizing method of statistics is maximum likelihood. To apply it to our model, we would have to add an error term \( \varepsilon_t \) that represents noise and unobserved variables, specify a full probability distribution for it, and then find parameters \( \beta \) and \( \alpha \) that maximizes the likelihood (which is kind of like a probability) that the model generates the data we actually have. We then have several ways to test the hypothesis that this model describes the data well.

The problem with maximum likelihood methods is that we have to specify a full probability distribution for the data. It’s common to assume a normal distribution for \( \varepsilon_t \). Sometimes you can assume normality without actually imposing too many restrictions on the model, but some people always like to complain whenever normal distributions are brought up.

Hansen’s insight, based on earlier work, was that we could write down the sample analog of \( \eqref{returns-moment} \), \begin{equation} \label{sample-analog} \frac 1 T \sum_{t=1}^T \beta \left( \frac{ c_{t+1}}{c_t} \right)^\alpha r_{i,t+1} = 1, \end{equation} where instead of an abstract expected value we have an actual sample mean. Equation \( \eqref{sample-analog} \) can be filled in with observed values of consumption growth and stock returns, and then solved for \( \beta \) and \( \alpha \). Hansen discovered the exact assumptions for when this is valid statistically. He also derived asymptotic properties of the resulting estimators and showed how to test restrictions on the model, so we can test whether the restriction represented by \( \eqref{returns-moment} \) is supported by the data.

One big puzzle in consumption based asset pricing is that consumption is much smoother than stock returns than is predicted by the theory (I haven’t derived that, but manipulate \( \eqref{returns-moment} \) a little and you will see it); one of my favorite papers in this literature uses garbage as a proxy for consumption.

How does GMM relate to other methods? It turns out that you can view maximum likelihood estimation as a special case of GMM. Maximum likelihood estimation involves maximizing the likelihood function (hence the name), which implies taking a derivative and setting the derivative (called the score function in this world) equal to zero. Well, that’s just GMM with a moment condition saying the score function is equal to zero. Similarly, Hayashi lays out how various other classical methods in econometrics such as OLS, 2SLS and SUR can be viewed as special cases of GMM.

People who are not expert theoretical econometricians often have to derive their own estimators for some new-fangled model they have come up with. In many contexts it is simply more natural, and easier, to use moment conditions as a starting point than to try to specify the entire (parameterized) probability distribution of errors.

One paper that I find quite neat is Richardson and Smith (1993), who propose a multivariate normality test based on GMM. For stock returns, skewness and excess kurtosis are particularly relevant, and normality implies that they are both zero. Since skewness and excess kurtosis are moments, it is natural to specify as moment conditions that they are zero, estimate using GMM, and then use the J-test to see if the moment conditions hold.

PS. Noah will tell me I am racist for getting my Japanese names confused. I was going to add that in addition to econometrics, Hayashi is also known for his work on the economy of Ancient Greece. That’s actually Takeshi Amemiya, whose Advanced Econometrics is a good overview of the field as it stood right before the “GMM revolution”.

17 comments:

Ryan Decker11:28 AM
Great explanation. I think one of the great things about GMM is that it allows us to estimate a single equation from a model without assuming that the entire model is "true." Additionally, an underappreciated aspect of GMM is that it shows how silly the structural vs. reduced form debate is. Viewed through the lens of GMM (which as you note is a generalization of OLS), this debate reduces to a differences in functional form.
Dave Giles1:23 PM
Nice explanation, but let's not forget that the J test is really just Sargan's old test of over-identification, re-visited.
LP1:30 PM
Thanks for the helpful primer! Quick q, shouldn't equations 2, 3 and 4 have beta^t so that later periods are discounted more heavily?
Mark A. Sadowski1:34 PM
The Nobel Prize committee honored Lars Peter Hansen for his work in developing a statistical method for testing rational theories of asset price movements. The statistical method Hansen developed is Generalized Method of Moments (GMM). The fact that Hansen won the Nobel Prize for his “empirical analysis of asset prices” caught me off guard as I did not realize this was the original application of GMM.

GMM is used in the estimation of the New Keynesian Phillips Curve. The New Keynesian Phillips Curve includes expectations of future inflation as an idependent variable. Since inflation expectations cannot really be observed, GMM offers a way around this difficulty.

The New Keynesian Phillips Curve, which was developed in 1995, is integral to most DSGE models that central banks across the globe are increasingly dependent. Thus it’s hard to imagine modern central banking without Hansen’s contributions to econometrics. So for Hansen to have won the prize for his empirical analysis of asset prices strikes me as somewhat ironic.
Anonymous1:38 PM
The coefficient of relative risk aversion (-cu''/u') isn't gamma in your example, it's 1-gamma.

I wouldn't say Hansen's insight was that moments could be used for estimation. As the link indicates, that was known for a while. It was about how one could use the 'extra' moment conditions that often crop up.
Anonymous2:18 PM
Shouldn't (4) have 1/beta instead of beta^t if it's taking a sample average?
Anonymous2:15 AM
Good to see one of the backup team interacting with the prizes and trying to provide some background. Extra kudos for taking on the hardest one. A bit disappointing that with 8 authors on the list to the right of the page, and 3 winners, we only got one post on the prize today.
Kevin6:30 AM
This is very interesting. I'm a statistician, and basically all statistical inference that is taught in statistics is maximum likelihood. You mention that a weakness of maximum likelihood is the need to make parametric assumptions. Fitting non-parametric models is done usually with spline methods in statistics.

I've never seen GMM taught before. So this post makes me wonder why this isn't taught in statistics, given that GMM is also a generalization of maximum likelihood according to this post. Any thoughts?

Also, how do you include LaTeX in blog posts?
Ray Lopez2:14 PM
Seems like much to-do about nothing. A sort of least squares fit algorithm for c rappy parametric models with sparse data. This is worthy of a Nobel? even more scary, central bank models depend on this? E-gad, we're in worse shape than I thought...
Anonymous1:27 AM
Why do the old timers shake heads at Hayashi?

Monday, October 14, 2013

Lars Peter Hansen explained. Kind of.

17 comments: