Linking basic months data across four consecutive months

Hello. I am trying to link basic monthly CPS data across four consecutive months and check how good my matches are by looking at CPSIDP linkages with AGE, SEX, and RACE using Stata. My ultimate goal is to calculate the quarterly transition rate of individuals from one industry to another industry from January 2014 to April 2015. I am very new to CPS, and I wanted to make sure that my understanding of linking monthly data and verifying the matches is correct.

In CPSIDP, it has information on unique household id(i.e. 6 digits) and person id(4 digits) as well as YEAR and MONTH of first participation of the survey. So the steps that I thought I could take are:

  1. Keep the observations whose YEAR and MONTH of first participation of the survey according to CPSIDP is between January 2014 and April 2015, and
  2. Make a dummy variable with value 1 if an individual’s AGE, SEX, and RACE do not change across 4 consecutive months since the first interview(i.e. which can be found from MISH), and
  3. Among individuals with dummy value of 1 from step 2, make a dummy variable with value 1 if an individual’s IND has changed from one to another.

Could anyone let me know if the way I am thinking is correct? Would there be a guide that I can access as to how I can link the basic month Data from CPS_IPUMS? Lastly, which weight should I use during this process?

Thank you very much for your help!

Your general understanding is correct. There is a whole page on linking the CPS, including many resources. There are also many materials from a summer 2018 workshop on using the CPS longitudinally, here. That page includes sample code for linking and validation of links. The way you’ve described your analysis (looking for a transition within a four-month period), the correct weights to use would be LNKFWMIS14WT and LNKFWMIS58WT.

From personal experience, I would advise you to be cautious in how you define within-quarter industry transitions. Industry (especially at the most detailed level) has significant measurement error in survey data. When you are looking at transitions, this measurement error is magnified, because an incorrectly reported code in one month can lead to a false industry transition.

1 Like

Thank you very much for your information. I have reviewed the sample code for linking and validation and it helped me a lot. Also, I was wondering if the actual linking code that is written in the sample (i.e. are publicly available.

Those files are just the Stata command files that are auto-generated when you submit an extract request using a fixed-width file. The command files read the fixed-width data into Stata, define variable formats and labels. The number of the file (e.g., 00224) will differ depending on the number of your extract. A short video tutorial showing how to use the command file can be found here: IPUMS - Open file in Stata - YouTube

1 Like

Thank you very much for your information. I was able to compare and run my code. I tried to validate the link for 8 consecutive months just like in the sample code with period January 2014 till April 2015, and I get all_match ratio as 96.40, which I thought was too high. Is it normal that I am getting such high value? I just wanted to confirm if the value I am getting makes sense.

Thank you very much for your help again!

Actually, I found out that there was a mistake, and I get around 16% matching with validation :smiley: