Random Effects Regression: A Mathematical Exposition

By Bas Machielsen

May 23, 2025

Introduction

Hi all, I wanted to write a short primer on random effects regression, since I’m working on this for a course I’m teaching, and I think the exposition in most textbooks isn’t that clear. To improve this, I present a mathematical derivation of the random effects model here. I should note that I’m not a fan of the random effects model at all. I think it should be seen as a curiosity rather than a solution in many cases. However, for didactical purposes, it makes sense to analyze the random effects model and observe it is a weighted average of OLS and FE models.

Set-up

Imagine you have data for N individuals (or firms, countries, etc.), denoted by i=1,,Ni = 1, \dots, N, observed over TT time periods, denoted by t=1,,Tt = 1, \dots, T. (For simplicity, we’ll assume a balanced panel, meaning each individual is observed for all TT periods, though RE can handle unbalanced panels.)

A generic linear model for such data could be:

yit=β0+β1x1it+β2x2it++βkxkit+vit y_{it} = \beta_0 + \beta_1 x_{1it} + \beta_2 x_{2it} + \dots + \beta_k x_{kit} + v_{it}

Where:

  • yity_{it} is the dependent variable for individual i at time t.

  • xjitx_{jit} is the j-th independent variable for individual i at time t.

  • β0\beta_0 is the overall intercept.

  • β1,,βk\beta_1, \dots, \beta_k are the coefficients for the independent variables.

  • vitv_{it} is the error term for individual i at time t.

The problem is that vitv_{it} likely contains unobserved individual-specific characteristics that are constant over time (e.g., innate ability, firm culture) and also purely random noise.

The Random Effects Model

The random effects model explicitly decomposes the error term v_it into two components:

vit=ui+ϵitv_{it} = u_{i} + \epsilon_{it}

So, the model becomes:

yit=β0+β1x1it++βkxkit+ui+ϵit y_{it} = \beta_0 + \beta_1 x_{1it} + \dots + \beta_k x_{kit} + u_i + \epsilon_{it}

We can also write the model by combining β0\beta_0 and uiu_i into an individual-specific intercept:

yit=(β0+ui)+β1x1it++βkxkit+ϵit y_{it} = (\beta_0 + u_i) + \beta_1 x_{1it} + \dots + \beta_k x_{kit} + \epsilon_{it}

Here, αi=β0+ui\alpha_i = \beta_0 + u_i is the random intercept for individual ii.

Assumptions

We assume the following things about uiu_i, ϵit\epsilon_{it} and their relationship:

Random Effect uiu_i:

  • E[ui]=0E[u_i] = 0 (The mean of the individual effects is zero; any non-zero mean is absorbed into \beta_0).

  • Var(ui)=σu2\text{Var}(u_i) = \sigma^2_u (The variance of the individual effects is constant).

  • Cov(ui,uj)=0 for ij\text{Cov}(u_i, u_j) = 0 \text{ for } i \neq j (Individual effects are uncorrelated across individuals).

Idiosyncratic Error ϵit\epsilon_{it}:

  • E[ϵit]=0E[\epsilon_{it}] = 0.

  • Var(ϵit)=σϵ2\text{Var}(\epsilon_{it}) = \sigma^2_\epsilon (The variance of the idiosyncratic errors is constant – homoscedasticity).

  • Cov(ϵit,ϵis)=0 for ts\text{Cov}(\epsilon_{it}, \epsilon_{is}) = 0 \text{ for } t \neq s (No serial correlation in idiosyncratic errors for a given individual, after accounting for uiu_i).

  • Cov(ϵit,ϵjs)=0 for ij\text{Cov}(\epsilon_{it}, \epsilon_{js}) = 0 \text{ for } i \neq j (Idiosyncratic errors are uncorrelated across individuals).

No correlation between uiu_i and ϵit\epsilon_{it}:

  • Cov(ui,ϵjt)=0 for all i,j,t\text{Cov}(u_i, \epsilon_{jt}) = 0 \text{ for all } i, j, t. The individual random effects are uncorrelated with the idiosyncratic errors.

The Structure of the Composite Error Term

The composite error term is vit=ui+ϵitv_{it} = u_i + \epsilon_{it}. Let’s look at its properties:

  • E[vit]=E[ui]+E[ϵit]=0+0=0E[v_it] = E[u_i] + E[\epsilon_{it}] = 0 + 0 = 0.

  • Var(vit)=Var(ui)+Var(ϵit)+2Cov(ui,ϵit)=σu2+σϵ2\text{Var}(v_{it}) = \text{Var}(u_i) + \text{Var}(\epsilon_{it}) + 2 \text{Cov}(u_i, \epsilon_{it}) = \sigma^2_u + \sigma^2_\epsilon (due to the third assumption).

Now, consider the covariance of the composite error terms for the same individual i but at different time periods tt and ss (tst \neq s):

Cov(vit,vis)=Cov(ui+εit,ui+εis)=Cov(ui,ui)+Cov(ui,εis)+Cov(εit,ui)+Cov(εit,εis)which equals, using all three assumptions:Var(ui)+0+0+0=σu2 \begin{aligned} \text{Cov}(v_{it}, v_{is}) &= \text{Cov}(u_i + ε_{it}, u_i + ε_{is}) \newline &= \text{Cov}(u_i, u_i) + \text{Cov}(u_i, ε_{is}) + \text{Cov}(ε_{it}, u_i) + \text{Cov}(ε_{it}, ε_{is}) \newline &\text{which equals, using all three assumptions:} \newline &\text{Var}(u_i) + 0 + 0 + 0 = σ_u^2 \end{aligned}

This is a key result: For a given individual ii, the error terms vitv_{it} and visv_{is} are correlated across time because they share the same uiu_i component. The correlation is:

Cor(vit,vis)=Cov(vit,vis)(Var(vit)Var(vis))=σu2(σu2+σϵ2) \text{Cor}(v_{it}, v_{is}) = \frac{\text{Cov}(v_{it}, v_{is})}{\sqrt{(\text{Var}(v_{it})\text{Var}(v_{is}))}} = \frac{ \sigma_u^2 }{ (\sigma_u^2 + \sigma_\epsilon^2)}

This correlation is often called the intra-class correlation coefficient (ICC), denoted by ρ\rho. It represents the proportion of the total variance in the error term that is attributable to the individual-specific effect uiu_i.

ρ=σu2(σu2+σϵ2) \rho = \frac{\sigma_u^2}{(\sigma_u^2 + \sigma_\epsilon^2)}

If σu2=0\sigma_u^2 = 0, then ρ=0\rho = 0, and there’s no individual-specific random effect, so OLS on pooled data would be appropriate.

Estimation: Generalized Least Squares (GLS)

Because of the serial correlation in the composite error term vitv_{it} (i.e., Cov(vit,vis)=σu20\text{Cov}(v_{it}, v_{is}) = \sigma_u^2 \neq 0), Ordinary Least Squares (OLS) applied to the pooled data yit=Xitβ+vity_{it} = X_{it}’\beta + v_{it} will still be unbiased and consistent, but it will be inefficient, and the standard errors will be incorrect.

The efficient estimator is Generalized Least Squares (GLS). This is because the setting is OLS with heteroskedastic data, for which the optimal estimator normalizes the variance again. Then, by the Gauss-Markov theorem, that estimator is efficient. The GLS estimator is:

βGLS=(XΩ1X)1XΩ1Y \beta_{GLS} = (X’\Omega^{-1}X)^{-1} X’\Omega^{-1}Y

where $\Omega$ is the variance-covariance matrix of the composite error vector vv. For panel data, Ω\Omega has a block-diagonal structure, with each block Ωi\Omega_i corresponding to individual ii. Ωi\Omega_i (a T×TT \times T matrix for individual i) has:

  • σu2+σϵ2\sigma_u^2 + \sigma_\epsilon^2 on the diagonal.
  • σu2\sigma_u^2 on the off-diagonals.

In practice, σu2\sigma_u^2 and σϵ2\sigma_\epsilon^2 are unknown. So, we have to use a Feasible GLS (FGLS) procedure:

  • Estimate σu2\sigma_u^2 and σϵ2\sigma_\epsilon^2 (e.g., from OLS residuals.)
    • How to do this?
    • The overall variance of the OLS residuals is a natural estimator for Var(vit)=σu2+σϵ2\text{Var}(v_{it}) = \sigma_u^2 + \sigma_\epsilon^2. This is the first equation we need.
    • Consider the average residual for each individual ii:
    • eiˉ=(1/T)te^it\bar{e_i} = (1/T) \sum_t\hat{e}_{it}.
    • These ei^\hat{e_i} are estimates of vi^=(1/T)tvit=(1/T)t(ui+ϵit)=ui+ϵˉi\hat{v_i} = (1/T) \sum_t v_{it} = (1/T) \sum_t (u_i + \epsilon_{it}) = u_i + \bar{\epsilon}_i. (Since uiu_i is constant for individual ii).
    • Now, let’s find the variance of these individual-average residuals across individuals:
    • Var(ei^)=Var(ui+ϵiˉ)\text{Var}(\hat{e_i}) = Var(u_i + \bar{\epsilon_i})
    • Assuming ui and ϵitu_i \text{ and } \epsilon_{it} are uncorrelated, and ϵit\epsilon_{it} are serially uncorrelated for a given \i\):
    • Var(ui+ϵiˉ)=Var(ui)+Var(ϵiˉ)=σu2+Var((1/T)tϵit)\text{Var}(u_i + \bar{\epsilon_i}) = \text{Var}(u_i) + \text{Var}(\bar{\epsilon_i}) = \sigma_u^2 + \text{Var}( (1/T) \sum_t \epsilon_{it} )
    • =σu2+(1/T2)tVar(ϵit)=σu2+(1/T2)Tσϵ2=σu2+σϵ2/T= \sigma_u^2 + (1/T^2) \sum_t \text{Var}(\epsilon_{it}) = \sigma_u^2 + (1/T^2) \cdot T \cdot \sigma_\epsilon^2 = \sigma_u^2 + \sigma^2_\epsilon / T
    • So, the sample variance of the eiˉ\bar{e_i} values, let’s call it seiˉ2 s_{\bar{e_i}^2} estimates σu2+σϵ2/T\sigma_u^2 + \sigma_\epsilon^2 /T .
    • Hence, our second equation is: σu2+σϵ2/T=seiˉ2\sigma^2_u + \sigma^2_\epsilon /T = s_{\bar{e_i}^2}
    • We have two equations and two unknowns and solve for σu2 and σϵ2\sigma^2_u \text{ and } \sigma^2_\epsilon.
  • Construct an estimate of Ω^\hat{\Omega} using σu2^\hat{\sigma_u^2} and σϵ2^\hat{\sigma_\epsilon^2}
  • Compute βFGLS=(XΩ^1X)1XΩ^1Y\beta^{FGLS} = (X’\hat{\Omega}^{-1}X)^{-1} X’\hat{\Omega}^{-1}Y.

The FGLS transformation (often called “quasi-demeaning” or “partial demeaning”) for the random effects model can be written as:

yitθyiˉ=β1(x1itθx1iˉ)++βk(xkitθxkiˉ)+(vitθvˉi) y_{it} - \theta \bar{y_i} = \beta_1( x_{1it} -\theta\bar{x_{1i}}) +\dots + \beta_k(x_{kit} - \theta\bar{x_{ki}}) + (v_{it} -\theta \bar{v}_i)

where:

  • yiˉ=(1/T)tyit\bar{y_i} = (1/T) \sum_t y_{it} (the mean of yy for individual ii). Similar for xjiˉ\bar{x_{ji}}.
  • θ=1σ2ϵ(Tiσu2+σϵ2)\theta = 1 - \sqrt{\frac{\sigma^2\epsilon}{(T_i \sigma^2_u + \sigma^2_\epsilon)}}
  • TiT_i is the no. of observations for individual i.i. If balanced, Ti=TT_i = T.

The reason for this is the following:

  1. The transformation yitθyiˉy_{it} - \theta \bar{y_i} (and similarly for X’s) aims to create a new error term vit=(ui+ϵit)θ(ui+ϵiˉ)v_{it*} = (u_i + \epsilon_{it}) - \theta(u_i + \bar{\epsilon_i}).
  2. We choose θ=1σ2ϵ(Tiσu2+σϵ2)\theta = 1 - \sqrt{\frac{\sigma^2\epsilon}{(T_i \sigma^2_u + \sigma^2_\epsilon)}} because this specific value makes the covariance between transformed errors for the same individual at different times, Cov(vit,vis)\text{Cov}(v_{it*}, v_{is*}) (for tst \neq s), equal to zero.
  3. With this θ\theta, the variance of the transformed errors also simplifies to Var(vit)=σϵ2\text{Var}(v_{it*}) = \sigma^2_\epsilon, meaning they are homoscedastic and serially uncorrelated.
  4. Applying OLS to this “quasi-demeaned” data is equivalent to GLS on the original data, yielding efficient estimates.

Interpretation

Observe the behavior of θ\theta:

  • If σu2=0\sigma_u^2 = 0 (no random effect), then θ=0\theta = 0. The RE model becomes pooled OLS.
  • If TiT_i \rightarrow \infty, then θ1\theta \rightarrow 1. The RE model behaves like the Fixed Effects (FE) model (which uses full demeaning).
  • If σϵ2=0\sigma_\epsilon^2 = 0 (all variation is due to uiu_i), then θ1\theta \rightarrow 1 (if Ti>0T_i > 0), also behaving like FE.

So, the RE estimator is a weighted average of the between-estimator (using yiˉ and xiˉ\bar{y_i} \text{ and } \bar{x_i}) and the within-estimator (FE, using yityiˉ and xitxiˉy_{it} - \bar{y_i} \text{ and } x_{it} - \bar{x_i}).

Posted on:
May 23, 2025
Length:
7 minute read, 1371 words
See Also: