Hi there - I’m working with 1900 Census data and struggling with the CHBORN variable. All the codebooks indicate that there should be a first category of “N/A” coded as “00”; however, when I download data, I never get this category. The first code is “1”, which according to the codebook should be “no children”.
I’m working with the 100% file, so I’d expect the N/A category not to be zero. I’ve used the IPUMS online table generator (which uses the 1900 5% sample), which does yield an N/A category, so seems to confirm my suspicion that there should be some N/As (coded 00) in the 100% 1900 download.
I am subsetting the 100% sample by SEX (2) AGE (15-49), RACE (1,2) and STATEICP (several states), LINK1900 (1) and LINK1910 (1). I have tried removing all of these filters, but still do not get the N/A (00) data for CHBORN. I’ve also explored whether this discrepancy can be explained by other variables, like MARST, but no luck.
I’ve downloaded the data in different formats to make sure there wasn’t an error with how I was reading the data into RStudio. No difference between .dat and .csv downloads.
I originally noticed this because I’m also working with the 1910 Census, which does include an N/A category (coded “00”), which made me worry something wasn’t matching up between these years.
So, it seems the online table generator is showing that N/As are part of the dataset, but they are dropped in my download. Any thoughts on why this is happening??
My only hunch is from the quality control variable QCHBORN; A very large number of entries of the “no children” (1) category are coded as QCHBORN=4 (hot deck allocation by IPUMS). Could these be the missing N/As?
Any advice would be much appreciated - thank you!
Variables are coded N/A when the observation is not in the universe for the provided variable. Often this is due to the respondent not being asked the question that corresponds to the variable. In the case of CHBORN, the universe tab notes that for the 1900 5% and 100% samples, the universe is Females, age 12+. None of the respondents within this universe will have an N/A code of 00 for CHBORN in these samples; only males (of any age) and females under 12 will be coded 00 since they are not in the universe. The universe for CHBORN however is different across years. In the 1910 100% sample, the universe is defined as Ever-married females, age 12+.
I took a look at your extracts and found that you used the select cases feature so that your data file consisted only of women ages 15-49. While all of these observations fit the universe for CHBORN in the 1900 5% and 100% samples, the further restriction on the universe in the 1910 sample that the respondent women must have been married at some point, will result in some women in your sample being outside the universe.
You mention that you tried removing all of these filters, but still do not get the N/A (00) data for CHBORN. I’m not sure if you were successful in this. I noticed that all of your extracts that include the 1900 5% or 100% samples (extracts 24-25, 28-34), all have this restriction in the select cases tool. To check, I created my own extract of the 1900 100% sample without any filters (except selecting cases from Delaware) and had N/A cases in my file. I hope this helps you with your analysis!
Thank you, @Ivan_Strahof - I really appreciate you taking time to look through the extracts and clarify the discrepancy in universe between those years. Many thanks for your help!