Quick question regarding the universe restriction for CHBORN in 1940.
It appears that the 1940 1% sample had an additional universe restriction of (females age 14+) which isn’t explicitly indicated in the enumeration form or instructions and has been dropped for the 1940 complete count. Was this an additional universe restriction put in place by IPUMS?
This is a good question and is certainly a little confusing. The first note is that this additional universe restriction is also present in the unedited source variable (US1940A_CHILDS) behind the integrated CHBORN variable. Specifically, every case that is CHBORN==0 “N/A” has either a 99 “inappropriate” or BB “N/A” value in US1940A_CHILDS. You can also verify this by looking at the case counts on the codes tabs of CHBORN and US1940A_CHILDS.
The issue really boils down to the marital status variable MARST. As the comparability tab for MARST notes: “In the IPUMS, all persons under age 12 are coded as “never married/single” for the years 1880-1930, though these censuses did not impose a minimum age.” This rule is followed in the construction of the 1940 full count file, but when the Census Bureau released the 1940 public use microdata files, they decided to restrict the question about marital status to only those who are 14 years or older. So, this means that the CHBORN variable is correct in the both the 1% and full count files, but because the 1% sample restricts the question about marital status to only those who are 14 years or older, only such individuals will have valid values for CHBORN or the source variable US1940A_CHILDS.
Hi Jeff,
Thanks for answering dallm031’s question regarding chborn. I’m trying to do something similar I think – use the chborn variable to estimated completed fertility for women in the 1900-1960 censuses. It seems that many observations of chborn in each census (I used either the 5% or 1% samples) before 1960 are NA (exact percentage depends on which census). Was there only a subgroup which received this question or did it have really low response rates? I’m a little worried about a selection effect here? Any advice as to how best measure completed fertility for women using the IPUMS data would be much appreciated. I’ve seen publications from the Census Bureau which reported on the numbers of children ever born over this period so I’m wondering how they did this if the data is largely missing.
Thanks, Geoff
The CHBORN Universe Tab might answer this question. Specifically, in years from 1940 and 1950, in addition to being a married female aged 14 or older, respondents needed to be on the “sample-line.” These “sample-line” individuals were designated to be asked an extra set of questions. In the 1940 and 1950 samples, the sampling was conducted to ensure that each household contains one of these “sample-line” individuals. On the original enumeration forms sample lines were pre-designated (i.e. no matter who fell on lines 14 and 29, they would get the extra questions). But when the samples were being drawn they explicitly selected households that contained sample-line individuals. This is likely the reason why you are observing the relatively large number of N/A observations in these years. More information on the sample-line characteristics is available on this page. Additionally, for representative statistics using sample-line characteristics, you should use the SLWT sampling weight variable.
With regard to the Complete Count 1940 IPUMS CHBORN variable, it is noted above that:
…. when the Census Bureau released the 1940 public use microdata files, they decided to restrict the question about marital status to only those who are 14 years or older. So, this means that the CHBORN variable is correct in the both the 1% and full count files , but because the 1% sample restricts the question about marital status to only those who are 14 years or older, only such individuals will have valid values for CHBORN or the source variable US1940A_CHILDS….
The above (my bold) is sort of ambiguous with regard to the Complete Count 1940; when I restrict my complete count 1940 IPUMS data extract of females by:
if 20 <=age<= 44 ;
if marst<= 5 ;
I get CHBORN values of the expected N/A (indicating line-number issue requiring SLWT), zero to n child values, but also a “99” category. So I am wondering why – given that the extract sent includes females only, and I constrained the data to only include ever-married women between ages 20-44 – I find a “99” category in the 1940 complete count CHBORN.
I am working with a subset sample of states and among these states (n=1,062,608) about 9370 or 0.9% of the sample has value “99”.
Thanks so much for assistance.
This 99 category in the complete count file includes missing and illegible responses for in-universe persons, but it looks like the value label was inadvertently omitted. My apologies for the confusion. The IPUMS USA team will update the value label accordingly and confirm that there aren’t valid cases being masked by our current assignments.