CPSIDP Demographic Mismatches

This question is directed to Tim Moreland–

I wrote a few weeks back re: Veterans Supplement Income Information, and received a very helpful response explaining how it’s necessary to link respondents to their outgoing rotation month. I have since been using the new variable “cpsidp” to link these respondents accordingly. Within the article “Making full use of the longitudinal design of the CPS,” the authors mention that when using “cpsidp” a small number of liked records do not match up on sex, race, or age. I have found this to be true in my merging as well, and am able to bypass the mismatches in age and sex by including those variables in my merge (in using STATA-- merge 1:1 cpsidp age sex using filename.dta). But of course with race you run into the additional difficulty of needing to recode the variable due to its change in categorical structure in 2003.

My question: What is the best way to locate and remove these mismatches from my data after/while merging? I am only interested in hanging onto the “plausible” matches, as referenced by the article above, rather than “all” matches linked by “cpsidp,” as, to give an example, the process currently matches up a handful of veteran responses to org-respondents below the age of 10. I am tempted to add more demographic / identifiable variables into the “merge” command in order to sift out the unlikely matches, but the more variables I add, the more similar the process seems to how I was merging pre-“cpsidp.” Is there an easier way?

Thank you in advance for any help you’re able to provide.


Caroline Crawford


Unfortunately, this discontinuity by the Census Bureau in coding the race variable creates a problem without an obvious solution. Prior to 2003, the race question forced respondents to choose a single race, while later years allowed for the choice of multiple races. Presumably, a respondent’s multiple-race response will typically still include the race from that respondent’s single-race response. You might, therefore, consider keeping only those matches where this is true and the respondent matches on age (+1) and sex. Ultimately, the decision of how to account for this change in the coding scheme is left to the discretion of the researcher.

For reference, IPUMS-USA created a variable (RACESING) that bridges the multiple-race responses into single-race responses. Reading the variable page for RACESING might be helpful for thinking through this process with the CPS data.

I hope this helps, and I am sorry I could not provide a more direct solution.