There is a problem with workhorse linear regression models. The problem is that the relationship between the response and the conditioners may not be linear, or even linearizable. This is not that big an issue when the true relationship is at least monotonic. When it is not, it is easy to get spurious results. In fact, unless the pattern is completely random, when we regress a response on a bunch of conditioners, we are bound to find significant relationships where none exist due to sheer randomness — overfitting is the bane of econometrics. Moreover, while we may be able to visualize one-on-one relationships via scatter plots, that becomes less and less feasible as we add more and more conditioners to our model. We can't check all the conditional relationships. And so some of the slopes we find significant may be an artifact of our choice of conditioners. Whether or not we should control for particular sets of variables
The Trump Swing Revisited
The Trump Swing Revisited
The Trump Swing Revisited
There is a problem with workhorse linear regression models. The problem is that the relationship between the response and the conditioners may not be linear, or even linearizable. This is not that big an issue when the true relationship is at least monotonic. When it is not, it is easy to get spurious results. In fact, unless the pattern is completely random, when we regress a response on a bunch of conditioners, we are bound to find significant relationships where none exist due to sheer randomness — overfitting is the bane of econometrics. Moreover, while we may be able to visualize one-on-one relationships via scatter plots, that becomes less and less feasible as we add more and more conditioners to our model. We can't check all the conditional relationships. And so some of the slopes we find significant may be an artifact of our choice of conditioners. Whether or not we should control for particular sets of variables