How do I decide what IPUMS-USA sample to use from any given year?


In some years, multiple IPUMS-USA samples are available. Which sample to use depends on your research questions; typically, the decision involves trading less precision on some variables for increased detail on other variables. Aside from the obvious distinction between USA and Puerto Rico samples, there are four major dimensions along which the samples vary:

  1. Density
    In 1880, 1900, 1930 and 1980-2000 data, multiple densities are available. Lower-density samples are smaller in size and therefore quicker to work with, but may not contain as much geographic detail as the higher-density samples. Additionally, users who want to study small subgroups of the population should use higher-density samples that will contain more cases of interest.

  2. Oversamples
    The 1860-1870 and 1900-1910 samples contain oversamples of racial and ethnic minorities. (While not technically an oversample, a 3% sample of households containing elderly people is also available in 1990.) These oversamples yield a greater number of minority cases; researchers who are interested in such subgroups should consider using them.

  3. Weights
    In 1990 and 2000, both weighted and unweighted samples are available. (It is also possible to select unweighted subsamples of the 1940 and 1950 data.) Unweighted samples may contain fewer cases from sparsely populated areas, but are easier to work with for statistical analyses that do not accept weights. For more information, see Sample Weights in the IPUMS introduction.

  4. Variables
    In 1970-2000, many samples contain different sets of variables. This is particularly important when using the 1970 data; many variables appear only in “Form 1” samples, while others appear only in “Form 2” samples; information on neighborhood characteristics is available in some samples but not others. In the other years, it is mostly geographic identifiers that vary across samples: some samples do not include identifiers for metropolitan areas, for example, while others do but suppress state identification for some cases.

More detail on each sample can be found on our samples page and in the User Guide’s section on sample designs. For some years, original codebooks describing the various samples are also available.