Short Memo About Inverse Probability Weighting

By Bas Machielsen

November 30, 2024

Introduction

I want to briefly have on paper the logic behind using inverse probability-weighted estimators of the ATT (average treatment effect on the treated) in a causal inference context.

Setup

Let $Y(1)$ and $Y(0)$ denote the potential outcomes under treatment and no treatment, respectively.
The ATT is defined as:

$ATT=\mathbb{E}[Y(1)−Y(0)∣T=1]$

where $T \in \{0,1\}$ indicates treatment status.

We assume that conditional on covariates $X$ , the treatment is independent of $Y(0)$ , an assumption known as unconfoundedness.

The IPW estimator uses weights to adjust for the selection bias in treatment assignment. The weights are derived from the propensity score, $e(X)=P(T=1∣X)$ which is the probability of receiving treatment given covariates $X$ .

Idea

The first part of the $ATT$ , $\mathbb{E}[Y(1) | T=1]$ is identified. The idea is to estimate $\mathbb{E}[Y(0)|T=1]$ using the observed outcomes from the untreated group, for which $T=0$ . The IPW Estimator for $\mathbb{E}[Y(0)|T=1]$ is:

$\frac{\mathbb{E}[Y \times 1(T=0) \times \frac{e(X)}{1−e(X)}]}{\mathbb{E}[1(T=0) \times \frac{e(X)}{1−e(X)}]}.$

where $\mathbb{E}(.)$ are now in-sample operations.

Unbiasedness Proof

I evaluate this estimator and show that the expected value equals $\mathbb{E}[Y(0)|T=1]$ .

For $T=0$ , the observed outcome $Y$ equals $Y(0)$ . Thus, the numerator of the IPW estimator becomes:

$\mathbb{E}\left[Y \times 1(T=0)\times \frac{e(X)}{1−e(X)}\right]=\mathbb{E}\left[Y(0)\times 1(T=0)\times \frac{e(X)}{1−e(X)}\right].$

Using the law of iterated expectations, and making use of independence between $Y(0)$ and $T$ conditional on $X$ , we condition on $X$ :

$\mathbb{E} \left[Y(0) \times 1(T=0) \times \frac{e(X)}{1−e(X)}\right]=\mathbb{E}[\mathbb{E}[Y(0)∣X] \times \mathbb{E}[1(T=0) \times \frac{e(X)}{1−e(X)}∣X]].$

The indicator $1(T=0)$ ensures $P(T=0∣X)=1−e(X)$ . Substituting the following:

$\mathbb{E}[1(T=0) \times \frac{e(X)}{1−e(X)}∣X]=e(X)$

into the numerator makes it that the numerator simplifies to:

$\mathbb{E}[Y(0) \cdot e(X)].$

Using the law of iterated expectation again (while conditioning again on $X$ ), the denominator of the IPW estimator normalizes the weights:

$\mathbb{E}[1(T=0)\times \frac{e(X)}{1−e(X)}]=E[e(X)].$

Combining Terms now makes it that the IPW estimator becomes:

$\frac{\mathbb{E}[Y(0) \times e(X)]}{E[e(X)]}$

Now note that the denominator is $P(T=1)$ and the nominator is the event that $P(T=1) \cap Y(0)$ . Thus, by the definition of conditional probability*, this equals $\mathbb{E}[Y(0)|T=1]$ .

*: For an explicit derivation of this, see below

Conclusion

This derivation showed that the inverse probability-weighted estimator is an unbiased estimator for the counterfactual outcome $Y(0)$ for the treated group. The derivation is fundamental and pops up a lot in the treatment evaluation literature. I have loosely based it on Imbens and Rubin (2015).

Appendix: Explicit Deriviation

Here is an explicit derivation why the estimator $\frac{\mathbb{E}[Y(0) \times e(X)]}{E[e(X)]}$ recovers $\mathbb{E}[Y(0) \mid T = 1]$ .

1. Using the Law of Total Probability:

The expectation $\mathbb{E}[Y(0) \mid T = 1]$ can be expressed using the law of total probability:

$\mathbb{E}[Y(0) \mid T = 1] = \int \mathbb{E}[Y(0) \mid X] \cdot P(X \mid T = 1) dX.$

Here: $\mathbb{E}[Y(0) \mid X]$ is the expected potential outcome for untreated individuals at a given $X$ .

$P(X \mid T = 1)$ is the distribution of $X$ in the treated group.

2. Reweighting the Untreated Group to Match $P(X \mid T = 1)$ :

We can rewrite $P(X \mid T = 1)$ in terms of $P(X)$ and $e(X)$ using Bayes’ rule:

$P(X \mid T = 1) = \frac{P(T = 1 \mid X) \cdot P(X)}{P(T = 1)} = \frac{e(X) \cdot P(X)}{\mathbb{E}[e(X)]}.$

Substituting this into the formula for $\mathbb{E}[Y(0) \mid T = 1]$ , we have:

$\mathbb{E}[Y(0) \mid T = 1] = \int \mathbb{E}[Y(0) \mid X] \cdot \frac{e(X) \cdot P(X)}{\mathbb{E}[e(X)]} dX.$

3. Expressing as a Weighted Expectation:

Recognizing that the integral over $P(X)$ represents the total expectation over the population distribution of $X$ , we have:

$\mathbb{E}[Y(0) \mid T = 1] = \frac{\int \mathbb{E}[Y(0) \mid X] \cdot e(X) \cdot P(X) dX}{\mathbb{E}[e(X)]}.$

The numerator, $\int \mathbb{E}[Y(0) \mid X] \cdot e(X) \cdot P(X) \, dX$ , is equivalent to $\mathbb{E}[Y(0) \cdot e(X)]$ , the expectation of $Y(0) \cdot e(X)$ over the population. Hence:

$\mathbb{E}[Y(0) \mid T = 1] = \frac{\mathbb{E}[Y(0) \cdot e(X)]}{\mathbb{E}[e(X)]}.$

Posted on:: November 30, 2024

Length:: 4 minute read, 664 words

See Also: