Propensity Score Perplexities

Today, I share yet another problem that I encountered in real life (slightly modified), but which would make an excellent addition to a problem set.

Exercise 1 (Inverse propensity weighted regression): You want to test the effect of treatment 1 on an individual’s responsiveness to treatment 2. You therefore run a series of experiments where you randomize both treatments 1 and 2. However, due to operational constraints, the probability of receiving treatment 1 differs across experiments (but the probability of getting treatment 2 is stable). Let $D_1, D_2$ respectively be dummies for treatments 1 and 2.

Model this problem as in Rubin (1974), and give a counterexample to show that in general, fitting the model $Y = \alpha + \beta_1 D_1 + \beta_2 D_2 + \beta_3 D_3 + \varepsilon$ using OLS and looking at $\beta_3$ will give the wrong answer.
Let $e$ index the experiments you are trying to pool together and let $p_e$ be the probability of ending up in treatment within a given experiment. Consider the following procedure: Estimate the model $\frac{Y_{i,e,t}}{p_e} = \alpha_{t} + \beta_t \frac{D_2}{p_e} + \varepsilon$ via OLS on the treated individuals across experiments. Next, estimate the model $\frac{Y_{i,e,c}}{1 - p_e} = \alpha_{c} + \beta_c \frac{D_2}{1 - p_e} + \varepsilon$ via OLS. Show that under the standard regularity conditions, $\hat\beta_t - \hat\beta_c$ consistently estimates the difference in differences in treatment effects between $D_1 = 1$ and $D_1 = 0$ . Will this estimator be unbiased?

Exercise 2: Consider the setting above

Now, suppose you just want to estimate the treatment effect for treatment 1. Show that the following “regression through the mean” approach will be consistent: First, fit the model $\frac{Y_{i,e,t}}{p_e} = \frac{\alpha_{t}}{p_e} + \varepsilon$ via OLS. Second, fit the model $\frac{Y_{i,e,c}}{1 - p_e} = \frac{\alpha_{c}}{1-p_e} + \varepsilon$ via OLS. Show that $\hat\alpha_t – \hat\alapha_c$ consistently estimates the pooled average treatment effect across experiments
The above estimator is almost the standard inverse propensity weighted difference in means. Give an expressions for $\frac{\hat\mu_{IPWE,t}}{\hat\alpha_t}$ and $\frac{\hat\mu_{IPWE,c}}{\hat\alpha_c}$ .
Use the previous part and Slutsky’s theorem to show the asymptotic equivalence of the two estimators. Are there any settings where we think one will perform better in finite samples?