Any issues pooling single-year data from 1880 to 2018?

Let’s say I want to combine all ACS samples that exist in IPUMS (what’s listed as “default sample from each year” below) and make trend estimates (like say percent female that are dentists, for example) from say 1880 to present day. Any issues I should be aware of?


Given the long time horizon of your analysis, I would encourage you to think about changes in concepts or variables. IPUMS harmonization methods facilitate comparisons across time and note issues on the comparability tab of variable-level documentation (e.g., OCC1950 comparability information). Using your example of female dentists, the occupation codes have changed considerably over time. Harmonized occupation variables from IPUMS seek to account for these changes, but some occupations simply disappear because of the methodology used to map occupation codes between schemes. While dentists are a persistent occupation category, this may not be true for all occupations and changes to data editing procedures, etc. may make it difficult to compare occupations over this full period.

It doesn’t sound like you have plans to pool samples (other than in your extract definition so you can look at time trends). I am, however, providing a bit of information on weighting and pooling in case I have misunderstood. For highly specific/small populations, it may be preferable to pool samples (or use the multi-year ACS datasets in recent years). Dividing the sampling weights by the number of samples pooled together is a common correction to weights for pooled analyses (note that weights in the multi-year ACS datasets have already been corrected for the pooling of multiple 1-year datasets). That being said, a more accurate method is to multiply the sampling weight in sample x by (the sample size in sample x) / (the pooled sample size). If the combined samples all have roughly the same sample size, then the two methods discussed will be approximately equivalent.

Terrific. Thank you! Any issues with tracking race and ethnicity across this time period? Like if I wanted to track race and ethnicity of say dentists across time?


There are definitely changes to these variables over time. I am linking the comparability information for the RACE and HISPAN variables, as well as the list of race/ethnicity/nativity variables for reference.

