Parsing NIU and Missing codes in the DDI file

shrikrishna_bhogaonk · April 12, 2024, 2:34am

I was working with ipumsr to parse an IPUMs extract file. I had a question about some of the variable metadata.

Now when I have categorical data, most of the information is contained in a <catgry> tag, which specifies the levels or groups in that variable as well as the numerical index. For example the index 1 corresponds to the month of January.

However, it seems like some variables have specific categorical-like data saved in the <codInstr> tag, such as Codes999999999 = N.I.U.\n999999998 = Missing. (1962-1964 only) for the variable INCTOT, which is “Total Personal Income”.

It seems a bit odd to include some categorical information in the codInstr tag. I was wondering how common this case is. Are there other variables that have categorical codings listed in the codInstr tag? Are the references in that tag usually limited to NIU and Missing info, or could there be other codings as well?

Thanks for any help you can provide.

Ivan_Strahof · April 16, 2024, 4:17pm

Yes, it is relatively common for IPUMS USA variables that are continuous to also include values that are categorical. You can view the labels for these values using ipums_val_labels() or by viewing the INCTOT variable (e.g. if your data was named data you could do data$INCTOT). Information from both the catgry and codInstr tags are captured here.

These codes are also mentioned in the codes tab for each variable. The codes tab for INCTOT notes these as specific variable codes. This is most commonly used for missing and NIU data, but it is also used in other cases such as for bottom and top codes. For example, all respondents who report an INCTOT value above a certain threshold are assigned the same top code by the Census Bureau to preserve confidentiality (see the IPUMS User Guide page on threshold values). This is also mentioned in the codes tab on the website. We recommend that users review the codes tabs for all variables in their extract in order to code respondents correctly for their analysis.

shrikrishna_bhogaonk · April 18, 2024, 5:05pm

Thank you Ivan. This was very helpful. Yeah, I understand now. I am working on writing a package to process some of the IPUMs data, and was just trying to understand the layout. Thanks for your help.

Topic		Replies	Views
Some questions about various kinds of missing in IPUMS CPS and IPUMS USA data	3	1077	January 2, 2019
Why are variables missing from the code book?	1	299	November 21, 2013
Data Dictionary	1	605	January 27, 2022
Truncation of ddi metadata CPS	1	345	July 30, 2019
Data-Documentation Mismatch: US 2000-2005 IND Codes INTERNATIONAL	1	12	May 28, 2025

Parsing NIU and Missing codes in the DDI file

Related topics