Hello.
I was searching for more information on the data quality flags for parental birthplace in the context of the CPS data. What exactly does “not allocated” mean? Is this equivalent to saying that these values were not imputed? And how exactly are the values allocated?
Thank you very much for your attention.
CPS data quality flags for parental place of birth (QFBPL and QMBPL) indicate whether certain non-responses such as “Don’t know”, “Refused”, or “Blank”, were imputed by the Census Bureau in their corresponding variable. Accordingly, a response that is recorded as not having been allocated means that a valid response was provided and that no imputation was required. The Census Bureau uses a variety of imputation techniques, which are covered in the CPS Design and Methodology technical paper (see page 132 for Edits and Imputations section) and summarized on this webpage. Here is the relevant section of the linked technical paper that provides details on two editing procedures used for these variables:
Relational imputation infers the missing value from other characteristics on the person’s record or within the household. For instance, if race is missing, it is assigned based on the race of another household member, or failing that, taken from the previous record on the file. Similarly, if relationship data are missing, it is assigned by looking at the age and sex of the person in conjunction with the known relationships of other household members. Missing occupation codes are sometimes assigned by analyzing the industry codes and vice versa. This technique is used as appropriate across all edits.
The ‘‘hot deck’’ imputation [also referred to as allocation] method assigns a missing value from a record with similar characteristics. Hot decks are defined by variables such as age, race, and sex. Other characteristics used in hot decks vary depending on the nature of the unanswered question. For instance, most labor force questions use age, race, sex, and occasionally another correlated labor force item such as full- or part-time status. This means the number of cells in labor force hot decks are relatively small, perhaps fewer than 100. On the other hand, the weekly earnings hot deck is defined by age, race, sex, usual hours, occupation, and educational attainment. This hot deck has several thousand cells. All CPS items that require imputation for missing values have an associated hot deck. The initial values for the hot decks are the ending values from the preceding month. As a record passes through the editing procedures, it will either donate a value to each hot deck in its path or receive a value from the hot deck.
Thanks a lot for the reply. If I may, I have a few follow-up questions.
- When the value labels for the data quality flag says, for instance, “refused to value” it is referring to the first imputation method, while “refused to allocated value” it refers to the second imputation method. Is that correct? If not, is there a way to know which imputation method was used?
- What does ”value to value” mean? What about ”value to allocated value”? And “value to blank”? Are these actual imputations? In terms of this last case, “value to blank”, I see that the corresponding variable is not blank. So what is happening precisely in this case?
Thanks for your reply. I will try to answer each of your questions in turn
- Yes, this is correct based on our interpretation of the documentation. The first round of edits performed by CPS are assigning values when the values can be reasonably expected, such as knowing the values of the variable for this respondent collected in prior months or they have household members that share demographics. When the value of the data quality flag is “[Blank/Don’t know/Refused] to allocated value” this has been allocated using “hot deck” imputation.
- For both QFBPL and QMBPL in the original CPS data, there are no instances of “value to value”, “value to allocated value” or “value to blank” (* see next paragraph) across all years of the BMS and ASEC (except one case in Dec 1994, which is likely a special case or error). Our variables occasionally share metadata across multiple variables and will display codes that are not always available in that particular variable. This is to improve readability and comparability between variables and samples across time.
- On your last point, thank you for bringing this to our attention. There is an error in QMBPL. In BMS samples beginning in February 1999, the following values in the original CPS data were misassigned in IPUMS data: 41 (“Blank to allocated value”) was incorrectly assigned to 42 (“Don’t know to allocated value”); 42 (“Don’t know to allocated value”) was incorrectly assigned to 43 (“Refused to allocated values”); and 43 (“Refused to allocated values”) was incorrectly assigned to 50 (“Value to blank”). We have made this fix and it will likely be released in the next CPS data release. For finding this error, we would like to reward you with an IPUMS coffee mug. Please send us an email at ipums@umn.edu if you would like to receive it!
Thanks a lot! In the case of QMARST there are instances of “value to value” and “value to allocated value”. Could you help me understand what they imply?
The Census Bureau has published this report that provides more information on how they perform edits and allocation. IPUMS does not create or perform the edits/allocations indicated by the data quality flag. My interpretation of “Value to value” allocations is that Census has replaced the value with the respondent’s own responses to questions in previous iterations of the BMS and are likely performed due to either privacy control measures or presence of non-valid or illogical responses due to enumerator, computer or respondent error. “Value to allocated value” likely indicates that Census used hotdeck allocation to replace the value, performed for privacy control measures or presence of non-valid or illogical responses due to enumerator, computer or respondent error.