Random Effects Regression: A Mathematical Exposition
By Bas Machielsen
May 23, 2025
Introduction
Hi all, I wanted to write a short primer on random effects regression, since I’m working on this for a course I’m teaching,
and I think the exposition in most textbooks isn’t that clear.
To improve this, I present a mathematical derivation of the random effects model here.
I should note that I’m not a fan of the random effects model at all. I think it should be seen as a curiosity rather than a solution in many cases.
However, for didactical purposes, it makes sense to analyze the random effects model and observe it is a weighted average of OLS and FE models.
Set-up
Imagine you have data for N individuals (or firms, countries, etc.), denoted by i=1,…,N, observed over T time periods, denoted by t=1,…,T. (For simplicity, we’ll assume a balanced panel, meaning each individual is observed for all T periods, though RE can handle unbalanced panels.)
A generic linear model for such data could be:
yit=β0+β1x1it+β2x2it+⋯+βkxkit+vit
Where:
yit is the dependent variable for individual i at time t.
xjit is the j-th independent variable for individual i at time t.
β0 is the overall intercept.
β1,…,βk are the coefficients for the independent variables.
vit is the error term for individual i at time t.
The problem is that vit likely contains unobserved individual-specific characteristics that are constant over time
(e.g., innate ability, firm culture) and also purely random noise.
The Random Effects Model
The random effects model explicitly decomposes the error term v_it into two components:
vit=ui+ϵit
So, the model becomes:
yit=β0+β1x1it+⋯+βkxkit+ui+ϵit
We can also write the model by combining β0 and ui into an individual-specific intercept:
yit=(β0+ui)+β1x1it+⋯+βkxkit+ϵit
Here, αi=β0+ui is the random intercept for individual i.
Assumptions
We assume the following things about ui, ϵit and their relationship:
Random Effect ui:
E[ui]=0 (The mean of the individual effects is zero; any non-zero mean is absorbed into \beta_0).
Var(ui)=σu2 (The variance of the individual effects is constant).
Cov(ui,uj)=0 for i=j (Individual effects are uncorrelated across individuals).
Idiosyncratic Error ϵit:
E[ϵit]=0.
Var(ϵit)=σϵ2 (The variance of the idiosyncratic errors is constant – homoscedasticity).
Cov(ϵit,ϵis)=0 for t=s (No serial correlation in idiosyncratic errors for a given individual, after accounting for ui).
Cov(ϵit,ϵjs)=0 for i=j (Idiosyncratic errors are uncorrelated across individuals).
No correlation between ui and ϵit:
Cov(ui,ϵjt)=0 for all i,j,t. The individual random effects are uncorrelated with the idiosyncratic errors.
The Structure of the Composite Error Term
The composite error term is vit=ui+ϵit. Let’s look at its properties:
E[vit]=E[ui]+E[ϵit]=0+0=0.
Var(vit)=Var(ui)+Var(ϵit)+2Cov(ui,ϵit)=σu2+σϵ2 (due to the third assumption).
Now, consider the covariance of the composite error terms for the same individual i but at different time periods t and s (t=s):
Cov(vit,vis)=Cov(ui+εit,ui+εis)=Cov(ui,ui)+Cov(ui,εis)+Cov(εit,ui)+Cov(εit,εis)which equals, using all three assumptions:Var(ui)+0+0+0=σu2
This is a key result: For a given individual i, the error terms vit and vis are correlated
across time because they share the same ui component. The correlation is:
This correlation is often called the intra-class correlation coefficient (ICC), denoted by ρ.
It represents the proportion of the total variance in the error term that is attributable to the individual-specific effect ui.
ρ=(σu2+σϵ2)σu2
If σu2=0, then ρ=0, and there’s no individual-specific random effect, so OLS on pooled data would be appropriate.
Estimation: Generalized Least Squares (GLS)
Because of the serial correlation in the composite error term vit (i.e., Cov(vit,vis)=σu2=0),
Ordinary Least Squares (OLS) applied to the pooled data yit=Xit’β+vit will still
be unbiased and consistent, but it will be inefficient, and the standard errors will be incorrect.
The efficient estimator is Generalized Least Squares (GLS). This is because the setting is OLS with heteroskedastic data, for which the optimal estimator normalizes the variance again. Then, by the Gauss-Markov theorem, that estimator is efficient. The GLS estimator is:
βGLS=(X’Ω−1X)−1X’Ω−1Y
where $\Omega$ is the variance-covariance matrix of the composite error vector v.
For panel data, Ω has a block-diagonal structure, with each block Ωi corresponding to individual i.
Ωi (a T×T matrix for individual i) has:
σu2+σϵ2 on the diagonal.
σu2 on the off-diagonals.
In practice, σu2 and σϵ2 are unknown. So, we have to use a Feasible GLS (FGLS) procedure:
Estimate σu2 and σϵ2 (e.g., from OLS residuals.)
How to do this?
The overall variance of the OLS residuals is a natural estimator for Var(vit)=σu2+σϵ2. This is the first equation we need.
Consider the average residual for each individual i:
eiˉ=(1/T)∑te^it.
These ei^ are estimates of vi^=(1/T)∑tvit=(1/T)∑t(ui+ϵit)=ui+ϵˉi. (Since ui is constant for individual i).
Now, let’s find the variance of these individual-average residuals across individuals:
Var(ei^)=Var(ui+ϵiˉ)
Assuming ui and ϵit are uncorrelated, and ϵit are serially uncorrelated for a given \i\):
yiˉ=(1/T)∑tyit (the mean of y for individual i). Similar for xjiˉ.
θ=1−(Tiσu2+σϵ2)σ2ϵ
Ti is the no. of observations for individual i. If balanced, Ti=T.
The reason for this is the following:
The transformation yit−θyiˉ (and similarly for X’s) aims to create a new error term vit∗=(ui+ϵit)−θ(ui+ϵiˉ).
We choose θ=1−(Tiσu2+σϵ2)σ2ϵ because this specific value makes the covariance between transformed errors for the same individual at different times, Cov(vit∗,vis∗) (for t=s), equal to zero.
With this θ, the variance of the transformed errors also simplifies to Var(vit∗)=σϵ2, meaning they are homoscedastic and serially uncorrelated.
Applying OLS to this “quasi-demeaned” data is equivalent to GLS on the original data, yielding efficient estimates.
Interpretation
Observe the behavior of θ:
If σu2=0 (no random effect), then θ=0. The RE model becomes pooled OLS.
If Ti→∞, then θ→1. The RE model behaves like the Fixed Effects (FE) model (which uses full demeaning).
If σϵ2=0 (all variation is due to ui), then θ→1 (if Ti>0), also behaving like FE.
So, the RE estimator is a weighted average of the between-estimator (using yiˉ and xiˉ) and the within-estimator (FE, using yit−yiˉ and xit−xiˉ).