How does IPUMS CPS variable SERIAL behave across years and downloads?


Based on what I’ve read on IPUMS tech documetation and this forum, I understand that when a user selects an IPUMS CPS sample, IPUMS CPS automatically generates a unique ID for each household, SERIAL. So, for example, if I select a 5-year March CPS sample with variables that draw from both the basic and the ASEC questionnaire, then all households within this sample will have a unique IPUMS-generated SERIAL number.The SERIAL variable disregards the rotating nature of the CPS in the sense that it does not have the same value for a household showing up in two consecutive March surveys (which happens about half of the time I believe). So, if this is correct, then all households within this 5-year sample will have unique SERIAL numbers, and it will not be possible to tell if two household records are actually the same household in two different years.


Question 1: Is the above is true? If so, why does IPUMS CPS also suggest using both YEAR and SERIALto uniquely identify households?

Question 2: If I download 10 variables for a 5-year March CPS sample, and then realize that I actually need another 2 variables, will the SERIAL values match between the original download (of 10 vars) and the additional download (of 2 variables)? In other words, within a given sample, are the SERIAL values generated are constant for each download? I understand that if I only wanted the additional two variables for 3 years of March data, this I would not be able to merge into my original 5 year dataset because SERIAL numbers are not comparable across samples.

Thanks for any insights

For clarification, a “sample” is the available data in a single month (e.g. the March 2013 sample). A “data extract” is what a user downloads from IPUMS-CPS, which includes the variables and samples selected by the user.

The SERIAL variable is not newly generated each time a user creates a data extract. Rather, IPUMS-CPS has pre-generated a SERIAL value for each household in each monthly sample. For example, every time someone downloads the March 2012 sample (or combination of samples including March 2012), the households in the March 2012 sample will always have the same assigned SERIAL value. In your 5-year example above, not every household would have a unique SERIAL value, because the same SERIAL values can be reused in multiple months. As a result, you would need to use both YEAR and SERIAL to uniquely identify a household. Similarly, you would need to include MONTH if your dataset had included observations from non-March months. Finally, it is not possible to track households across years using SERIAL.

As for your second question, it is possible to merge one dataset with five years of March data to another with three (overlapping) years of March data. As mentioned above, SERIAL is not generated each time a user creates a data extract; therefore, the households that appear in both of your two extracts would have the same SERIAL number in each extract. Keep in mind that you would need to merge your data based on both SERIAL and YEAR.

Hope this helps.

Thank you Tim. I see that I had misunderstood what was meant by ‘sample’. This makes sense, I appreciate your answer.