Explanation for missing data points in the incwage variable


Upon downloading datasets for 1980, 1990, 2000, 2010, and 2012. We reviewed the data and noticed that, on average, a third of respondents each year list “0” as their income under the incwage variable.

Is there an explanation for why so many people are listing zero.

We are trying to analyze the racial breakdown for a given occupation by income. We are considering dropping (not including) all data points that list “0” as their incwage. We are concerned about how this will skew the data.

Can you give any more information about why respondants list an occupation but mark their incwage as zero? Do you have any guidelines around including or not including these data points and the affect that will have on the analysis?

Lastly we are applying the perwt weight to our tabulations. If we were to drop all datapoints that list “0” as their income, what effect would applying perwt have on the data?

Thanks so much,




Since INCWAGE measures the total wage or salary income for all persons age 16 and over, persons who are not in the labor force, unemployed, or self-employed are considered to be within the question universe. For people in these groups we would expect their income from wages and salary to be zero. And, the Universe for the occupation question includes people who worked in the last 5 years. So even people who are not in the labor force (making an INCWAGE of 0) are provided an occupation if they had worked at all in the proceeding 5 years. So, instead of dropping cases with INCWAGE==0, you can restrict your analysis to employed persons (using the EMPSTAT variable) who are not self-employed (using the CLASSWKR variable). If you would like to include people who are self-employed, you can simply add the value of the variable INCBUS00, or the variables INCBUS and INCFARM (depending on the year) to INCWAGE. Do note that it is possible to have negative self-employment income in cases where the respondent’s business or farm was operating at a loss in that particular year.

Removing cases from your analysis will not effect PERWT. PERWT essentially provides the number of people in the total U.S. population represented by that single respondent. The PERWT value is stored on the person record and does not change based on the total number of persons included in the analysis. So, if a respondent has a PERWT value of 27, that respondent represents 27 people in the total population.

I hope this helps.