We have debated the issue with "bad controls" in the past, and today's post on GWG reminded me of the topic. This is not really an RI, but I've been working through a parametric example / model of bad control to better understand the problem, and I figured maybe it could be of interest to others, although it's somewhat technical. Hopefully the mods won't mind too much. Alternatively you could say I'm RIing myself (slightly).
(tagging /u/besttrousers and /u/commentsrus who may be interested)
The model is inspired by Montgomery et al. (2016), who also have some further references to the literature on bad controls (apparently this topic is known in other fields as post-treatment bias). Compared to them, I'm using continuous rather than binary variables and different notation.
So imagine the following statistical model:
T is a binary 0/1 random variable, ε and η are random normal shocks
T, ε, η are mutually independent, exogenous and everything
outcome y is given by y = β{y,0} + β{y,T} * T + β{y,x} * x + ε
covariate x is given by x = β{x,0} + β{x,T} * T + β{x,ε} * ε + η
Interpretation: betas are coefficients. T is a randomized binary treatment (say, teaching randomly chosen coal-miners programming). An outcome y (e.g. earnings after some period) depends on treatment T, unobserved "skill" ε (could stand for intelligence or motivation) but also on some additional covariate x (e.g. how much time does the coal-miner spend on self-study after taking the course). The covariate in turn can also depend on treatment, skill and some orthogonal idiosyncratic shock η. We may expect that taking the course makes one more likely to study programming also in free time, but also that those with higher skill will be more likely to self study.
The model is not cast directly into a potential outcome framework, but hopefully we can agree on how to define the causal effect of treatment on outcome:
β{y,T} is "partial" causal effect, i.e. change in earnings if we had the person take the course but kept self-study time fixed
β{y,T} + β{y,x} * β{x,T} is "composite" causal effect, i.e. overall change in earnings caused by taking the course, including the effect of increase in self-study
Which is the "true" causal effect? It really depends on what we care about. Sometimes it's the overall effect that matters, but we may be also interested in the partial effect (after all, if the whole effect of the course worked through motivating workers to study on their own, maybe there are cheaper ways to do that).
Moving on to the main result, let's say we have large data sample with observations on (y, x, T). What will we estimate when we regress y on T (and intercept)? And what happens when we regress y on both x and T?
regressing on T and intercept only, the coefficient on treatment will be (in plim sense) equal to β{y,T} + β{y,x} * β{x,T}
regressing on T, x and intercept, the coefficient on treatment will be β{y,T} - β{x,T} * (β{x,ε} * Var[ε]) / (β{x,ε}2 * Var[ε] + Var[η])
(I have discovered a truly marvelous proof of this, which reddit is too narrow to contain. Simulations seem to confirm. Proof is left as an exercise for the dear reader.)
What these nasty math formulas mean is that:
regressing on T only will correctly estimate the composite effect (after all, T was randomized)
regressing on both x and T will not correctly estimate the partial effect if β{x,T}!=0 and β{x,ε}!=0 (essentially, X is endogenous, and if correlated with T, endogeneity will contaminate also estimate of coefficient on T). It would however estimate the partial effect consistently if either β{x,T}=0 or β{x,ε}=0.
To sum up: bad control problem is legit in the sense that even if we want to estimate the partial causal effect, adding more controls will not correctly estimate it if those controls are themselves affected by treatment and endogenous. Neither is there guarantee that it will give us closer estimate that the composite effect, since the bias and its sign depends on several coefficients.
OTOH, if we truly want the partial causal effect (like if we want to isolate pure discrimination component of GWG), not controlling for covariates is not the answer either. The correct solution would be to instrument for x, but that's easier said than done.
[–]mentionhelper 0 ポイント1 ポイント2 ポイント (0子コメント)