Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge

Question

Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge

Edition 6ISBN: 130527010X

Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge

Edition 6ISBN: 130527010X

Exercise 20

This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting y. Let m be a variable such that m = 1 if we do not observe x and m = 0 if we observe x. We assume that y is always observed. The population model is

(i) Provide an interpretation of the stronger assumption

In particular, what kind of missing data schemes would cause this assumption to fail?

(ii) Show that we can always write

(iii) Let {(x_i, y_i, m_i): i = 1, ..., n} be random draws from the population, where xi is missing when m_i = 1. Explain the nature of the variable z_i = (1 – m_i )x_i. In particular, what does this variable equal when x_i is missing?

(iv) Let ? = P(m = 1) and assume that m and x are independent. Show that

where m_x = E(x). What does this imply about estimating b1 from the regression y_i on z_i, i = 1, …, n?

(v) If m and x are independent, it can be shown that

where v is uncorrelated with m and z = (1 – m)x. Explain why this makes m a suitable proxy variable for mx. What does this mean about the coefficient on z_i in the regression

(vi) Suppose for a population of children, y is a standardized test score, obtained from school records, and x is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume m and x are independent? Explain.

Step-by-step solution

Verified

Step 1 of 8

Step 2 of 8

Step 3 of 8

Step 4 of 8

Step 5 of 8

Step 6 of 8

Step 7 of 8

Step 8 of 8

Answer 1

(i)

E (??|??, ??) = 0 would mean that all other factors that affect y are uncorrelated with x and m, which is the missing data indicator. This assumption can get failed when the data is not MCAR and when analyses are based only on the complete case. This may lead to a bias.