Any issues pooling single-year data from 1880 to 2018?

dpl2001 · November 4, 2020, 1:49pm

Hi all,

Let’s say I want to combine all ACS samples that exist in IPUMS (what’s listed as “default sample from each year” below) and make trend estimates (like say percent female that are dentists, for example) from say 1880 to present day. Any issues I should be aware of?

-Dan

KariWilliams · November 4, 2020, 9:35pm

Given the long time horizon of your analysis, I would encourage you to think about changes in concepts or variables. IPUMS harmonization methods facilitate comparisons across time and note issues on the comparability tab of variable-level documentation (e.g., OCC1950 comparability information). Using your example of female dentists, the occupation codes have changed considerably over time. Harmonized occupation variables from IPUMS seek to account for these changes, but some occupations simply disappear because of the methodology used to map occupation codes between schemes. While dentists are a persistent occupation category, this may not be true for all occupations and changes to data editing procedures, etc. may make it difficult to compare occupations over this full period.

It doesn’t sound like you have plans to pool samples (other than in your extract definition so you can look at time trends). I am, however, providing a bit of information on weighting and pooling in case I have misunderstood. For highly specific/small populations, it may be preferable to pool samples (or use the multi-year ACS datasets in recent years). Dividing the sampling weights by the number of samples pooled together is a common correction to weights for pooled analyses (note that weights in the multi-year ACS datasets have already been corrected for the pooling of multiple 1-year datasets). That being said, a more accurate method is to multiply the sampling weight in sample x by (the sample size in sample x) / (the pooled sample size). If the combined samples all have roughly the same sample size, then the two methods discussed will be approximately equivalent.

dpl2001 · November 4, 2020, 9:50pm

Terrific. Thank you! Any issues with tracking race and ethnicity across this time period? Like if I wanted to track race and ethnicity of say dentists across time?

Best,
Dan

KariWilliams · November 5, 2020, 5:40pm

There are definitely changes to these variables over time. I am linking the comparability information for the RACE and HISPAN variables, as well as the list of race/ethnicity/nativity variables for reference.

dpl2001 · November 5, 2020, 6:00pm

Thank you for your help with this!

Best,
Dan

Topic		Replies	Views
Merging decenial IPUMS data since 1970	4	465	June 16, 2022
when dealing with historical data what should we be aware of; did coding change a lot 1980 and after? USA	1	331	April 23, 2014
Are DATANUM, SERIAL, and PERNUM the same in single-year and 5-year ACS samples? USA	3	748	July 17, 2019
Pooled weights for yearly trend analyses in NHIS	3	91	October 7, 2024
Pooling ACS and Census extracts USA	1	426	September 1, 2020

Any issues pooling single-year data from 1880 to 2018?

Related topics