Short Note on Ridge Regression
By Bas Machielsen
November 23, 2022
Introduction
I am doing a couple of assignments involving penalized estimators such as Ridge regression, and I wanted to do a short derivation of its asymptotic covariance. In comparison to the existing resources which I could find, some details are left out, which I wanted to recapitulate more clearly. I’ll also contrast the variance of the Ridge estimator to the variance of the OLS estimator, illustrating a fact that also comes to the surface in many other resources, namely that the variance of the Ridge estimator is smaller than that of the OLS estimator.
Setting
I assume non-stochastic regressors , and a model with , with .
The ridge estimator can be expressed as:
It is easy to show that the Ridge estimator is biased for by evaluating the expected value.
Consistency of the Ridge Estimator
Doing so also allows us to express as:
Taking the of this expression and applying Slutsky’s theorem then gives:
After realizing that
the above expression simplifies to , thus showing consistency.
Asymptotic Variance of the Ridge Estimator
The asymptotic variance variance of the Ridge regression around its can be obtained by rewriting the estimator in the following form:
Which by the CLT converges to its . The variance is then determined by its second part, since the first part is stochastic.
- First, then, according to the (a) CLT, converges to its , which is zero by assumption, with a variance being equal to . Then, by the product limit normal rule (Cameron & Trivedi, 2005, Theorem A.17), the variance of is then equal to:
which can also be expressed as:
Comparison of Variance with OLS Estimator
Now, I show the positive semidefiniteness of the matrix Var - Var :
- Taking the previous expression, and defining , we can rewrite Var in a simple form:
The difference between Var - Var is then:
It remains to show that
is a positive semi-definite matrix. First, since is p.s.d., is also p.s.d. (A short proof is comparing the eigenvalues of a matrix and its inverse). Also, if you add to a matrix, its eigenvalues increase with . Hence with , we increase the already positive eigenvalues, and is also p.s.d.
After some derivation, we can show that the difference in variances is equal to the following quadratic form:
Since, by the preceding discussion, all matrices here are p.s.d., the final variance is positive semi-definite and the variance of the OLS estimator is larger than the variance of the Ridge estimator.
Conclusion
In this post, I have set out some properties of the Ridge estimator, arguably the easiest to understand shrinkage estimator. I have focused on some standard theoretical results, and try to explain this in a way that works for me. Thank you for reading!
- Posted on:
- November 23, 2022
- Length:
- 4 minute read, 745 words
- See Also: