I’m interested in doing some analyses using variables that appear only in the 1970 census in IPUMS (IND5YR and OCC5YR), including some regressions. Is it possible to pool together the different 1% samples from the 1970 census that include these variables (1970 1% state fm1, 1% metro fm1, 1% neigh fm1), under the rationale that these are very likely (or can verifiably be) different individuals in each sample, in order to increase my sample size? If so, would any weights (e.g. PERWT) be needed to adjust the data for these analyses, or is that not necessary (e.g. because the 1970 samples are “flat” or unweighted IPUMS samples) ?
Thanks to Chris for digging this thread up! As Jeff says in the linked forum response, it is possible to pool together the three form one 1970 1% samples in order to create a 3% sample of the form one data since these samples are, for all practical purposes, mutually exclusive. This type of procedure is referenced on page 8 of the 1970 codebook. Since the 1970 samples are each a 1-in-100 national random sample, each observation has the same weight (i.e. flat weights) that does not need to be modified before running regression analysis. The only case where you would want to modify the weights is if you were to produce aggregate counts in your analysis such as the number of respondents who worked in agriculture in 1965. Since you’re appending together three samples that each have weights that sum to the total US population in 1970, all of your aggregate counts will be inflated by three. The solution would be to divide your weights (PERWT or HHWT) by the number of pooled samples.
Hello, thank you for your previous help. I have a slight modification to my question before that I was hoping to get your thoughts on.
I’m creating a dummy variable =1 if certain conditions are true of 1970 in a respondent’s county. However, with the 1970 samples location is only identified to the level of county in one 1970 1% (‘metro’) sample, to only state in another (‘state’) sample, and only to region in another (‘neigh’) sample. For some states and regions (call this group A) I am able to know for certain that these conditions are met for all counties in the state/region, while for other states/regions (group B) I am not certain whether these conditions are met (since the dummy=1 for some counties in the state/region but =0 for others).
Would it be possible for me to run regressions pooling the subset of observations from in the 1970 state and neigh 1% samples that are in group A with all observations from the 1970 metro sample (dropping all the group B observations from the 1970 state and neigh samples)? If this is not possible with just the flat sample, could it be possible with some sort of re-weighting?
I’m glad that my earlier response was helpful. While providing analytical advice is beyond the scope of the IPUMS User Support Team, I can share that there is no technical limitation to pooling together a subset of these three samples. This can be done with just the flat sample without any reweighting, unless you are looking to generate estimates that are aggregate counts. In this case, you would follow the procedure above and divide the weights by the number of pooled samples (i.e. three in the case that you outline). As you note, your estimates from such an analysis would only be valid for the observations that you retain in your sample. I will leave it to you to determine if this is appropriate or how this impacts your interpretation of your results.