I take a break from my usual pure theory to present something a little more applied. Suppose you estimate the OLS model
where is a dummy variable and
is continuous. For conceptual simplicity, suppose that the conditional expectation function truly as specified above and that the errors are homoskedastic. Upon estimating this model, you find out that not only is the interaction term not statistically significant, but so is
. This is a bit surprising since if you estimated the model
you find that is highly significant. Additionally, the point estimates are totally different anyways. This points to the first subtlety of interpreting interaction coefficients. In the top model, in population, we have that
By contrast, in the bottom regression, we have
Even if we go so far as to assume that the population CEF is correctly specified by the interactions model, under unequal slopes, these two values are not in general the same. In fact, the only way they are the same is if either
for
in population
- Both of the above fail, but in a way that happens to just cancel out.
Now, suppose you find yourself in the situation (as I did) where variation in was assigned according to an experiment. The experiment worked in the following way. First, a proportion
of all units are chosen at random from the population to be part of the experiment. Those units in the experiment are randomly assigned a value of
from some set
(in fact, assume
, which makes the analysis somewhat simpler in a bit). The rest of the units were assigned by default to a value of c. You now first run the model only on those units tagged as in the experiment. When you run the interacted model, you find that estimate of
has a high variance. Your idea is then to use all of the other units to decrease the variance of the estimator. Does this work? To my surprise, the standard errors hardly changed at all.
Since a model with interaction terms is equivalent to running OLS separately on each subsample, it suffices to analyze the standard errors of the intercept term on a single OLS model. I assume sampling of the following form. There is a fixed number of observations where
varies. Additionally, we have some sample
of observations where
is fixed to be
from the first sample, and
is drawn from the same distribution as the first sample (this is why it is important that
is determined experimentally). What is the asymptotic behavior of the variance as
? Recall that for homoskedastic (and correctly specified) OLS, the variance estimator is given by
We are in particular interested in the 1,1 coordinate of the above matrix which I rewrite:
Denote by the variance of
in the original
observations. Since sample variances and moments are variances and moments in their own rights, we can use the law of total variance to show that the above expression will converge in probability to
assuming (the numerator comes from the assumption that the
additional data points identically have
) where a sequence of random variables
is said to be
if both
and
(and
is
means that it is “bounded in probability”, or
as
for some fixed, finite
). This is just a fancy way of saying that the random variable converges to a constant. In fact, in the above, since I assume my original sample of
to be fixed from the perspective of our asymptotics, the only non-constant stems from the uncertainty in estimating the standard errors. Therefore, I find exactly the result that increasing the sample past a certain point hardly changes the standard errors. This seems counterintuitive at first, but after some reflection, it seems to simply be a stark reminder that intercepts and interactions are not what we expect them to be.