I am aware of at least 8 Kinds of “missingness" in the IPUMS CPS data (and I think IPUMS USA as well).
There are numerical values that are replaced by specific integers that denote the variable has been top-coded; or
bottom-coded, without providing information about the top- or bottom-coded value.
There are variable- and year-specific values of top-or bottom-coded variables equal to the mean value of the top-or bottom-coded observations (unless these have all been replaced by swap values now).
There are data quality flags that indicate if this is a person’s answer about themselves or somebody else.
There is some way of indicating that there was nobody to interview – not sure if this will generally be a value in the variable itself or in the data quality flags.
There are refusals to answer specific questions – also not sure if this will be a value in the variable or in data quality flags.
There are “don’t know” answers that may be distinct from refusals to answer.
Not in universe values, perhaps of several kinds.
For missing types 3 through 6, these values may be blank, or they may be imputed; I’d want to distinguish these cases.
I have two questions about these various kinds of missing values:
First, is this list complete, or are there other kinds of missing that are quantitatively important? (I’m not going to fuss about, e.g. fragment values).
Second, are these kinds of missing consistently coded across questions, in terms of the entries that indicate each kind of missingness, such that a data preparation function could identify and treat them unifirmly? Also, are variables consistant as to how information about missingness appears, e.g. in a quality flag as vs in an answer? And if so, is this information gathered somewhere that I could see?
You folks are terrific.
Peace, Andrew Hoerner