ASEC sample issue

Hi! Not sure if this is the best place to ask, but I wasn’t sure where to turn. I downloaded CPS ASEC data for 2023 and the numbers don’t add up. For example, a simple count of the population weights suggests that the weighted total population is 81 million people, which is half of what it should be. I’m probably missing something obvious but I’d appreciate any help I could get. thanks!

sum(repdata$ASECWT_1, na.rm = TRUE)
[1] 81223731

I took a look at your most recent data extracts and noticed that they included linked longitudinal ASEC samples; between this and your reference to ASECWT_1 in your code, I assume you are seeing the unexpected population counts using a linked, longitudinal file.

There are a few differences when it comes to working with the linked samples as opposed to the cross-sectional data. We provide a link to documentation on these linked samples on the sample selection page (right above the checkboxes for specific samples). The documentation notes that these samples should be weighted using LNKFW1YWT, which provides additional adjustment for the probability that a respondent links across the ASEC samples.

The reason that ASECWT_1 produces this much smaller population size is because the linked files only include records of persons that are linked to the following year’s ASEC. There are two main factors that affect a person’s ability to be linked: their month-in-sample (MISH) and whether they are part of the ASEC oversample (ASECOVERP). As a result of the rotating panel structure, only persons in their first four months of the survey (MISH >= 1 & MISH <= 4) will be sampled a year later in the following ASEC. This effectively cuts the sample that can be linked in half. Additionally, longitudinal extracts do not currently link persons who are part of the ASEC oversamples (see this working paper for a discussion of the oversample issue and solutions the IPUMS-CPS team is exploring). Finally, even if someone satisfies these criteria they will not necessarily appear in the linked sample if they refused to respond to the survey or moved out of the sampled housing unit. LNKFW1YWT adjusts for this by using a raking procedure that recalibrates ASECWT based on the population that was linked; you can learn more about weighting linked datasets in our user note.