Combining 1970 samples


I’m interested in doing some analyses using variables that appear only in the 1970 census in IPUMS (IND5YR and OCC5YR), including some regressions. Is it possible to pool together the different 1% samples from the 1970 census that include these variables (1970 1% state fm1, 1% metro fm1, 1% neigh fm1), under the rationale that these are very likely (or can verifiably be) different individuals in each sample, in order to increase my sample size? If so, would any weights (e.g. PERWT) be needed to adjust the data for these analyses, or is that not necessary (e.g. because the 1970 samples are “flat” or unweighted IPUMS samples) ?

Thank you in advance for the help!


It sounds like you are running into the same issue I was a while ago, I found this thread to be exactly what I was looking for:

1 Like

Thanks to Chris for digging this thread up! As Jeff says in the linked forum response, it is possible to pool together the three form one 1970 1% samples in order to create a 3% sample of the form one data since these samples are, for all practical purposes, mutually exclusive. This type of procedure is referenced on page 8 of the 1970 codebook. Since the 1970 samples are each a 1-in-100 national random sample, each observation has the same weight (i.e. flat weights) that does not need to be modified before running regression analysis. The only case where you would want to modify the weights is if you were to produce aggregate counts in your analysis such as the number of respondents who worked in agriculture in 1965. Since you’re appending together three samples that each have weights that sum to the total US population in 1970, all of your aggregate counts will be inflated by three. The solution would be to divide your weights (PERWT or HHWT) by the number of pooled samples.

Thank you Ivan and Chris for that information and for the reference to cite! Super helpful