1. For a person in a household with a specified household identifier CPSID, how many, and which, digits are supposed to be the same in their person identifier, CPSIDP?

  2. If you have a person’s CPSID in a file, can the digits of CPSIDP which are different be determined from other variables?

The reason I am asking is that I’ve divided my (almost) complete IPUMS-CPS data into 22 files, and I think I have established that something like like a third of the aggregate size comes from repeated variables added automatically. I need all those variables, but I don’t need 22 copies of each of them. So I am trying to identify the minimal set of variables I need to link individuals and households within the ASEC (only) and keep those in every file, for safety’s sake, while keeping the rest only in the files for their respective variable groups. I think that minimal set consists of the CPSID and any digits in CPSIDP that are, or can be, different from those in the household they are in. Does that seem right?

Are the answers to questions (1) and (2) above unchanged if a person moves into an ongoing household after the first month that the household is in the survey? How is PERNUM handled for such individuals? Will they always be higher than any of the PERNUM values for people present in the household in the first month the household is in the survey?


I’ll answer each question one at a time:

(1) The first 13 digits of CPSID and CPSIDP should be exactly the same for individual observations.

(2) This is a good question. PERNUM does not necessarily correspond with the within household values of CPSIDP. So, the answer to your question is no. This is related to the issue you discuss at the end of your question. CPSIDP is a unique ID variable that is designed to enable linking between CPS samples and specific individuals may not be present in every sample. PERNUM, by contrast, counts individuals within each household within each sample.