Measurement Error and the Education Premium
If you take a look at the Census, education appears to be extremely lucrative. Back in 1975, drop-outs earned about 20% less than high school grads, college grads earned over 50% more than high school grads, and holders of advanced degrees earned over 100% more than high school grads. Nowadays, the differences are stronger still: drops-outs earn over one-third less than high school grads, college grads earn 83% more than high school grads, and holders of advanced degrees earn almost three times as much as high school grads.
Unlike many labor economists, I freely admit that an appreciable fraction of these wage gaps are actually caused by pre-existing ability. Education hardly deserves full credit for the observed education premia. Lately, though, I’ve been thinking about a data problem that potentially leads us to understate the full effect of education: measurement error.
The trouble is that the Census is based on self-reports. While self-reports are far from worthless, they’re also far from perfect. Imagine, then, that 10% of high school graduates check the “college grad” box, and 10% of college grads incorrectly check the “high school” box. What happens?
Suppose the average high school grad earns $25,000/year, and the average college grad earns $75,000 per year. That’s a 200% college premium. The Census, however, will tabulate their average earnings to be .9*$25,000+.1*$75,000=$30,000 for high school grads, and .9*$75,000+.1*$25,000=$70,000 for college grads. That’s a mere 133% college premium.
Statisticians have a standard way to correct for problems like these. Just measure the reliability of your suspect variable, then apply the appropriate correction. For example, education has a reliability of about .9. If you ignore this measurement error when you estimate the education premium in the General Social Survey, controlling for a short IQ test (WORDSUM) and age, you get the following results. (Logrealinc is the log of family income).
——————————————————————————
logrealinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
educ | .0970101 .0023115 41.97 0.000 .0924794 .1015407
wordsum | .0715542 .0032552 21.98 0.000 .0651739 .0779345
age | -.0002933 .0003638 -0.81 0.420 -.0010063 .0004198
_cons | 8.280534 .0335161 247.06 0.000 8.21484 8.346227
——————————————————————————
Long-story short: Ignoring measurement error, the education premium is 9.7% per year of education.
If you correct for measurement error in education, however, the education coefficient goes up to 11.3%:
——————————————————————————
logrealinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
educ | .1125098 .0026644 42.23 0.000 .1072873 .1177322
wordsum | .0606613 .0033714 17.99 0.000 .0540531 .0672695
age | .0002703 .0003649 0.74 0.459 -.0004449 .0009855
_cons | 8.12053 .0361073 224.90 0.000 8.049757 8.191303
——————————————————————————
If you’re a cheerleader for education, you’ll seize on these results to argue that standard estimates consistently understate the education premium. There’s just one problem with this reaction: ALL variables have measurement error! When you correct for only one form of measurement error, ignoring all the others, you stack the deck in favor of the variable you fix. Look at what would have happened if we corrected the original results for IQ’s measurement error (reliability=.74), ignoring all other data problems:
——————————————————————————
logrealinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
educ | .0835483 .0026555 31.46 0.000 .0783434 .0887533
wordsum | .109542 .0049556 22.10 0.000 .0998287 .1192552
age | -.0009268 .0003671 -2.52 0.012 -.0016464 -.0002072
_cons | 8.253405 .0334372 246.83 0.000 8.187866 8.318944
——————————————————————————
If we correct for mismeasurement of IQ, and ignore mismeasurement of education, the estimated education premium actually falls to 8.4%. At the same time, the measured effect of IQ jumps from .07 (one more question right on the ten-question test boosts income by 7%) to .11 (one more question right on the ten-question test boosts income by 11%).
If you really take measurement error seriously, you have to correct for all forms of measurement error. When you do, it’s entirely possible for the measured effect of education not to rise. Look at what happens if we simultaneously correct for measurement error of education and IQ:
——————————————————————————
logrealinc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
educ | .0989883 .0031315 31.61 0.000 .0928503 .1051263
wordsum | .0948702 .0052509 18.07 0.000 .0845781 .1051623
age | -.0003461 .000371 -0.93 0.351 -.0010733 .0003812
_cons | 8.116265 .0359477 225.78 0.000 8.045805 8.186724
——————————————————————————
Correcting for both forms of measurement error sharply inflates the estimated effect of IQ, but the education premium is virtually identical to the naive estimate that ignores measurement error entirely.
What would happen if we had a typical long list of control variables, each and every one corrected for measurement error? It’s entirely reasonable to expect the education premium to fall. After all, measurement error corrections have larger effects for variables measured with low reliability, and most variables are less reliably measured than education.
HT: Steve Miller, who ran the STATA errors-in-variables regressions for me just days before the birth of his fifth child.
The post appeared first on Econlib.