Conley Standard Errors

Author

Bas Machielsen

Published

March 30, 2025

The Logic and Mathematics of Conley (1999) Standard Errors

Standard errors as proposed by Conley (1999) address an important issue in econometrics: the potential for spatial correlation between observations. This technical approach provides a method for estimating the variance-covariance matrix of estimators, particularly in the presence of dependence that is related to a measure of distance. The core of the methodology is an extension of the Generalized Method of Moments (GMM) framework, analogous to the Newey-West procedure for time series data that accounts for serial correlation.

Accounting for Spatial Interdependence

In many economic applications, observations are not independent. For instance, the economic outcomes of neighboring regions, the performance of firms in the same industry, or the behaviors of individuals within a social network can be correlated. Standard ordinary least squares (OLS) or GMM estimators remain consistent in the presence of such spatial correlation, but their standard errors will be biased, leading to invalid inference.

Conley’s approach provides a non-parametric way to correct for this. The fundamental idea is that the correlation between the errors of two observations is a function of the distance between them. Observations that are “close” to each other are allowed to have correlated errors, while the correlation is assumed to diminish as the distance increases, eventually becoming zero beyond a certain cutoff point. This is a more flexible approach than, for example, clustering standard errors by discrete groups, as it allows for a smoother decay of correlation with distance.

The Mathematical Framework: A Spatial HAC Estimator

Conley’s method is a type of Heteroskedasticity and Autocorrelation Consistent (HAC) estimator, adapted for a spatial context. Let’s consider a model estimated via GMM, where the moment conditions are given by:

\(E[g(W_i, \beta_0)] = 0\)

where \(W_i\) is the data for observation \(i\) and \(\beta_0\) is the true parameter vector. The GMM estimator \(\hat{\beta}\) is chosen to make the sample moments as close to zero as possible. The asymptotic variance-covariance matrix of the GMM estimator is given by:

\(V = (G' \Omega^{-1} G)^{-1}\)

where \(G = E[\nabla_\beta g(W_i, \beta_0)]\) and \(\Omega\) is the variance-covariance matrix of the moment conditions. In the presence of spatial correlation, \(\Omega\) is not a diagonal matrix. The key contribution of Conley (1999) is to provide a consistent estimator for this \(\Omega\) matrix.

The estimator for the variance-covariance matrix of the sample moments, \(\hat{\Omega}\), is constructed as a weighted sum of the cross-products of the moment conditions. For a sample of size \(N\), the \((j,k)\)th element of \(\hat{\Omega}\) is given by:

\(\hat{\Omega}_{jk} = \frac{1}{N} \sum_{i=1}^{N} \sum_{l=1}^{N} K(d(i,l)) g_j(W_i, \hat{\beta}) g_k(W_l, \hat{\beta})'\)

Here: * \(g_j(W_i, \hat{\beta})\) is the \(j\)-th moment condition for observation \(i\) evaluated at the estimated parameters. * \(d(i,l)\) is the distance between observation \(i\) and observation \(l\). This distance can be geographical or, more generally, defined in any relevant “economic” space. * \(K(\cdot)\) is a kernel function that down-weights the covariance between observations as the distance between them increases.

The Role of the Kernel and the Cutoff Distance

The choice of the kernel function and a cutoff distance is crucial. The kernel function determines the weights applied to the cross-products of the moment conditions at different distances. A common choice is the uniform kernel, where all pairs of observations within a certain cutoff distance are given equal weight (typically 1), and pairs beyond that distance are given a weight of 0.

Another option is the Bartlett kernel (or triangular kernel), which gives linearly declining weights as the distance approaches the cutoff. This ensures a positive semi-definite variance-covariance matrix estimate.

The cutoff distance, often denoted as \(\bar{d}\), represents the threshold beyond which the spatial correlation is assumed to be negligible. The choice of this cutoff is a key empirical decision and can be guided by prior knowledge of the spatial dependence structure of the data.

The Complete Conley Variance-Covariance Matrix Estimator

Let \(X\) be the matrix of independent variables. The Conley Heteroskedasticity and Spatially Correlated (HAC) robust variance-covariance matrix estimator for an OLS estimator \(\hat{\beta}\) is:

\(\hat{V}_{Conley} = (X'X)^{-1} N \hat{S} (X'X)^{-1}\)

where \(\hat{S}\) is the long-run covariance matrix of the moment conditions, estimated as:

\(\hat{S} = \hat{\Gamma}_0 + \sum_{j=1}^{N-1} w(j, L) (\hat{\Gamma}_j + \hat{\Gamma}_j')\)

Here: * \(\hat{\Gamma}_j\) is the sample spatial autocovariance at “lag” \(j\). * \(w(j, L)\) is a weight function, determined by the kernel, that depends on the distance lag \(j\) and the bandwidth or cutoff \(L\).

More explicitly, the estimator for the variance-covariance matrix of the parameters, in the context of OLS with spatially correlated errors, is constructed as follows:

Let \(\hat{u}_i\) be the residual for observation \(i\) from the initial OLS regression. The “meat” of the sandwich estimator is the estimated variance-covariance matrix of the moment conditions, \(\hat{\Psi}\). An element of this matrix, corresponding to the covariance between regressor \(j\) and regressor \(k\), is estimated by:

\(\hat{\Psi}_{jk} = \frac{1}{N} \sum_{i=1}^{N} \sum_{l=1}^{N} K\left(\frac{d(i,l)}{d_{cutoff}}\right) (x_{ij} \hat{u}_i) (x_{lk} \hat{u}_l)\)

where \(K(\cdot)\) is the kernel function and \(d_{cutoff}\) is the distance cutoff. This matrix \(\hat{\Psi}\) is then used to form the robust variance-covariance matrix estimator for the OLS coefficients:

\(\hat{V}_{Conley} = (X'X)^{-1} (X' \hat{\Psi} X) (X'X)^{-1}\)

This formulation highlights how the pairwise cross-products of the instrumented residuals are weighted based on the distance between the observations, thereby accounting for the spatial dependence in the error structure. This provides more reliable standard errors and consequently more accurate hypothesis testing when spatial correlation is a concern.