expand icon
book Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge cover

Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge

Edition 6ISBN: 130527010X
book Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge cover

Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge

Edition 6ISBN: 130527010X
Exercise 20

This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting y. Let m be a variable such that m = 1 if we do not observe x and m = 0 if we observe x. We assume that y is always observed. The population model is

 This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting <i>y</i>. Let <i>m</i> be a variable such that <i>m</i> = 1 if we do not observe <i>x</i> and <i>m</i> = 0 if we observe <i>x.</i> We assume that <i>y</i> is always observed. The population model is   <blockquote> (i) Provide an interpretation of the stronger assumption   In particular, what kind of missing data schemes would cause this assumption to fail? (ii) Show that we can always write   (iii) Let {(<i>x</i><i><sub>i</sub></i>, <i>y</i><i><sub>i</sub></i>, <i>m</i><i><sub>i</sub></i>): <i>i</i> = 1, ..., <i>n</i>} be random draws from the population, where <i>x</i><i>i</i> is missing when <i>m</i><i><sub>i</sub></i> = 1. Explain the nature of the variable <i>z</i><i><sub>i</sub></i> = (1 – <i>m</i><i><sub>i</sub></i> )<i>x</i><i><sub>i</sub></i>. In particular, what does this variable equal when <i>x</i><i><sub>i</sub></i> is missing? (iv) Let <i>?</i> = P(<i>m</i> = 1) and assume that <i>m</i> and <i>x</i> are independent. Show that   where <i>m<sub>x</sub></i> = E(<i>x</i>). What does this imply about estimating b1 from the regression <i>y</i><i><sub>i</sub></i> on <i>z</i><i><sub>i</sub></i>, <i>i</i> = 1, …, <i>n</i>? (v) If <i>m</i> and <i>x</i> are independent, it can be shown that   where <i>v</i> is uncorrelated with <i>m</i> and <i>z</i> = (1 – <i>m</i>)<i>x</i>. Explain why this makes <i>m</i> a suitable proxy variable for <i>mx.</i> What does this mean about the coefficient on <i>z</i><i><sub>i</sub></i> in the regression   (vi) Suppose for a population of children, <i>y</i> is a standardized test score, obtained from school records, and <i>x</i> is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume <i>m</i> and <i>x</i> are independent? Explain. </blockquote>

(i) Provide an interpretation of the stronger assumption

 This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting <i>y</i>. Let <i>m</i> be a variable such that <i>m</i> = 1 if we do not observe <i>x</i> and <i>m</i> = 0 if we observe <i>x.</i> We assume that <i>y</i> is always observed. The population model is   <blockquote> (i) Provide an interpretation of the stronger assumption   In particular, what kind of missing data schemes would cause this assumption to fail? (ii) Show that we can always write   (iii) Let {(<i>x</i><i><sub>i</sub></i>, <i>y</i><i><sub>i</sub></i>, <i>m</i><i><sub>i</sub></i>): <i>i</i> = 1, ..., <i>n</i>} be random draws from the population, where <i>x</i><i>i</i> is missing when <i>m</i><i><sub>i</sub></i> = 1. Explain the nature of the variable <i>z</i><i><sub>i</sub></i> = (1 – <i>m</i><i><sub>i</sub></i> )<i>x</i><i><sub>i</sub></i>. In particular, what does this variable equal when <i>x</i><i><sub>i</sub></i> is missing? (iv) Let <i>?</i> = P(<i>m</i> = 1) and assume that <i>m</i> and <i>x</i> are independent. Show that   where <i>m<sub>x</sub></i> = E(<i>x</i>). What does this imply about estimating b1 from the regression <i>y</i><i><sub>i</sub></i> on <i>z</i><i><sub>i</sub></i>, <i>i</i> = 1, …, <i>n</i>? (v) If <i>m</i> and <i>x</i> are independent, it can be shown that   where <i>v</i> is uncorrelated with <i>m</i> and <i>z</i> = (1 – <i>m</i>)<i>x</i>. Explain why this makes <i>m</i> a suitable proxy variable for <i>mx.</i> What does this mean about the coefficient on <i>z</i><i><sub>i</sub></i> in the regression   (vi) Suppose for a population of children, <i>y</i> is a standardized test score, obtained from school records, and <i>x</i> is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume <i>m</i> and <i>x</i> are independent? Explain. </blockquote>

In particular, what kind of missing data schemes would cause this assumption to fail?

(ii) Show that we can always write

 This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting <i>y</i>. Let <i>m</i> be a variable such that <i>m</i> = 1 if we do not observe <i>x</i> and <i>m</i> = 0 if we observe <i>x.</i> We assume that <i>y</i> is always observed. The population model is   <blockquote> (i) Provide an interpretation of the stronger assumption   In particular, what kind of missing data schemes would cause this assumption to fail? (ii) Show that we can always write   (iii) Let {(<i>x</i><i><sub>i</sub></i>, <i>y</i><i><sub>i</sub></i>, <i>m</i><i><sub>i</sub></i>): <i>i</i> = 1, ..., <i>n</i>} be random draws from the population, where <i>x</i><i>i</i> is missing when <i>m</i><i><sub>i</sub></i> = 1. Explain the nature of the variable <i>z</i><i><sub>i</sub></i> = (1 – <i>m</i><i><sub>i</sub></i> )<i>x</i><i><sub>i</sub></i>. In particular, what does this variable equal when <i>x</i><i><sub>i</sub></i> is missing? (iv) Let <i>?</i> = P(<i>m</i> = 1) and assume that <i>m</i> and <i>x</i> are independent. Show that   where <i>m<sub>x</sub></i> = E(<i>x</i>). What does this imply about estimating b1 from the regression <i>y</i><i><sub>i</sub></i> on <i>z</i><i><sub>i</sub></i>, <i>i</i> = 1, …, <i>n</i>? (v) If <i>m</i> and <i>x</i> are independent, it can be shown that   where <i>v</i> is uncorrelated with <i>m</i> and <i>z</i> = (1 – <i>m</i>)<i>x</i>. Explain why this makes <i>m</i> a suitable proxy variable for <i>mx.</i> What does this mean about the coefficient on <i>z</i><i><sub>i</sub></i> in the regression   (vi) Suppose for a population of children, <i>y</i> is a standardized test score, obtained from school records, and <i>x</i> is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume <i>m</i> and <i>x</i> are independent? Explain. </blockquote>

(iii) Let {(xi, yi, mi): i = 1, ..., n} be random draws from the population, where xi is missing when mi = 1. Explain the nature of the variable zi = (1 – mi )xi. In particular, what does this variable equal when xi is missing?

(iv) Let ? = P(m = 1) and assume that m and x are independent. Show that

 This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting <i>y</i>. Let <i>m</i> be a variable such that <i>m</i> = 1 if we do not observe <i>x</i> and <i>m</i> = 0 if we observe <i>x.</i> We assume that <i>y</i> is always observed. The population model is   <blockquote> (i) Provide an interpretation of the stronger assumption   In particular, what kind of missing data schemes would cause this assumption to fail? (ii) Show that we can always write   (iii) Let {(<i>x</i><i><sub>i</sub></i>, <i>y</i><i><sub>i</sub></i>, <i>m</i><i><sub>i</sub></i>): <i>i</i> = 1, ..., <i>n</i>} be random draws from the population, where <i>x</i><i>i</i> is missing when <i>m</i><i><sub>i</sub></i> = 1. Explain the nature of the variable <i>z</i><i><sub>i</sub></i> = (1 – <i>m</i><i><sub>i</sub></i> )<i>x</i><i><sub>i</sub></i>. In particular, what does this variable equal when <i>x</i><i><sub>i</sub></i> is missing? (iv) Let <i>?</i> = P(<i>m</i> = 1) and assume that <i>m</i> and <i>x</i> are independent. Show that   where <i>m<sub>x</sub></i> = E(<i>x</i>). What does this imply about estimating b1 from the regression <i>y</i><i><sub>i</sub></i> on <i>z</i><i><sub>i</sub></i>, <i>i</i> = 1, …, <i>n</i>? (v) If <i>m</i> and <i>x</i> are independent, it can be shown that   where <i>v</i> is uncorrelated with <i>m</i> and <i>z</i> = (1 – <i>m</i>)<i>x</i>. Explain why this makes <i>m</i> a suitable proxy variable for <i>mx.</i> What does this mean about the coefficient on <i>z</i><i><sub>i</sub></i> in the regression   (vi) Suppose for a population of children, <i>y</i> is a standardized test score, obtained from school records, and <i>x</i> is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume <i>m</i> and <i>x</i> are independent? Explain. </blockquote>

where mx = E(x). What does this imply about estimating b1 from the regression yi on zi, i = 1, …, n?

(v) If m and x are independent, it can be shown that

 This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting <i>y</i>. Let <i>m</i> be a variable such that <i>m</i> = 1 if we do not observe <i>x</i> and <i>m</i> = 0 if we observe <i>x.</i> We assume that <i>y</i> is always observed. The population model is   <blockquote> (i) Provide an interpretation of the stronger assumption   In particular, what kind of missing data schemes would cause this assumption to fail? (ii) Show that we can always write   (iii) Let {(<i>x</i><i><sub>i</sub></i>, <i>y</i><i><sub>i</sub></i>, <i>m</i><i><sub>i</sub></i>): <i>i</i> = 1, ..., <i>n</i>} be random draws from the population, where <i>x</i><i>i</i> is missing when <i>m</i><i><sub>i</sub></i> = 1. Explain the nature of the variable <i>z</i><i><sub>i</sub></i> = (1 – <i>m</i><i><sub>i</sub></i> )<i>x</i><i><sub>i</sub></i>. In particular, what does this variable equal when <i>x</i><i><sub>i</sub></i> is missing? (iv) Let <i>?</i> = P(<i>m</i> = 1) and assume that <i>m</i> and <i>x</i> are independent. Show that   where <i>m<sub>x</sub></i> = E(<i>x</i>). What does this imply about estimating b1 from the regression <i>y</i><i><sub>i</sub></i> on <i>z</i><i><sub>i</sub></i>, <i>i</i> = 1, …, <i>n</i>? (v) If <i>m</i> and <i>x</i> are independent, it can be shown that   where <i>v</i> is uncorrelated with <i>m</i> and <i>z</i> = (1 – <i>m</i>)<i>x</i>. Explain why this makes <i>m</i> a suitable proxy variable for <i>mx.</i> What does this mean about the coefficient on <i>z</i><i><sub>i</sub></i> in the regression   (vi) Suppose for a population of children, <i>y</i> is a standardized test score, obtained from school records, and <i>x</i> is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume <i>m</i> and <i>x</i> are independent? Explain. </blockquote>

where v is uncorrelated with m and z = (1 – m)x. Explain why this makes m a suitable proxy variable for mx. What does this mean about the coefficient on zi in the regression

 This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the “missingness” is unrelated to both the unobservable and observable factors affecting <i>y</i>. Let <i>m</i> be a variable such that <i>m</i> = 1 if we do not observe <i>x</i> and <i>m</i> = 0 if we observe <i>x.</i> We assume that <i>y</i> is always observed. The population model is   <blockquote> (i) Provide an interpretation of the stronger assumption   In particular, what kind of missing data schemes would cause this assumption to fail? (ii) Show that we can always write   (iii) Let {(<i>x</i><i><sub>i</sub></i>, <i>y</i><i><sub>i</sub></i>, <i>m</i><i><sub>i</sub></i>): <i>i</i> = 1, ..., <i>n</i>} be random draws from the population, where <i>x</i><i>i</i> is missing when <i>m</i><i><sub>i</sub></i> = 1. Explain the nature of the variable <i>z</i><i><sub>i</sub></i> = (1 – <i>m</i><i><sub>i</sub></i> )<i>x</i><i><sub>i</sub></i>. In particular, what does this variable equal when <i>x</i><i><sub>i</sub></i> is missing? (iv) Let <i>?</i> = P(<i>m</i> = 1) and assume that <i>m</i> and <i>x</i> are independent. Show that   where <i>m<sub>x</sub></i> = E(<i>x</i>). What does this imply about estimating b1 from the regression <i>y</i><i><sub>i</sub></i> on <i>z</i><i><sub>i</sub></i>, <i>i</i> = 1, …, <i>n</i>? (v) If <i>m</i> and <i>x</i> are independent, it can be shown that   where <i>v</i> is uncorrelated with <i>m</i> and <i>z</i> = (1 – <i>m</i>)<i>x</i>. Explain why this makes <i>m</i> a suitable proxy variable for <i>mx.</i> What does this mean about the coefficient on <i>z</i><i><sub>i</sub></i> in the regression   (vi) Suppose for a population of children, <i>y</i> is a standardized test score, obtained from school records, and <i>x</i> is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume <i>m</i> and <i>x</i> are independent? Explain. </blockquote>

(vi) Suppose for a population of children, y is a standardized test score, obtained from school records, and x is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume m and x are independent? Explain.

Step-by-step solution
Verified
like image
like image

Step 1 of 8

(i)

E (??|??, ??) = 0 would mean that all other factors that affect y are uncorrelated with x and m, which is the missing data indicator. This assumption can get failed when the data is not MCAR and when analyses are based only on the complete case. This may lead to a bias.


Step 2 of 8


Step 3 of 8


Step 4 of 8


Step 5 of 8


Step 6 of 8


Step 7 of 8


Step 8 of 8

close menu
Introductory Econometrics: A Modern Approach 6th Edition by Jeffrey M Wooldridge
cross icon