CPS monthly survey linkage to create panel data

Hello IPUMS community,

I have been working with the CPS ASEC but lately trying to get my hands on the monthly sample. After going through a handful of posts on IPUMS forum, I was pretty confused before I come across this post - which was very clear about how I should approach the monthly data.

In spite of clear guidelines, I’m still trying to get hold of the concept of merging since I am not sure what will give me the best outcome to create the panel. I’m not sure should I create the 8 month panel ( given in the first exercise of stata workshop in 2018 ) or should I just create one year panel by linking CPSIDP. And also there is a concern of validation by age , race and sex.

My main purpose is to see how a person changes job and how it affects the wage in few selective states.

My sample is from 1995-2005. But, I haven’t been able to code it properly to come up with the matching. I think it would be best to see what’s the maximum time CPSIDP shows up within 16 months of his first interview - which goes perfectly with my research interest.

In the stata exercise 2 of workshop 2018 , it also shows me the code to track what’s the maximum time a unique CPSIDP shows up ( even after 1 year or 2 year later given the interview takes place two years before in December month ).

Since, I’m. a beginner with the monthly sample , therefore wanted ti make sure I’m starting off to the right direction with my concept.

Any helpful suggestion or additional help with coding will most definitely help !

To be more precise about my research interest , I think it would be more suitable if I can keep track of a unique CPSIDP till it finishes the 16 month cycle of the 4-8-4 formation. So, after going through and reading a few more relevant posts and very helpful feedback of IPUMS community I came with the following command to create the 16 month panel ( which I adapted a little from the command given in this post : FAILURE: Linking to the Same Calendar Month across Two Consecutive Years from JULY 1994 to AUGUST 1995

gen time = ym(year, month)

forvalues t=372(1)680{

keep if (time== t' & mish==8)|(time== t’-16 & mish==4)

order cpsidp year month

sort cpsidp year month

by cpsidp: gen obs=_N

by cpsidp: drop if obs!=2

In the above command , I just changed the third line from the one provided in the post by putting time== t’-16 instead of time== t’-12 , since I’m trying to link observation throughout 16 months instead of making it a one year panel data.

And, the command provided in the post is for finding EARNWEEK , whereas in my case the outcome is INCWAGE and UHRSWEEK1. Are using the mish==8 and mish==4 correct choice of values given that I’m creating a 16 month panel data whereas the post maker was creating one year panel data ?

I hope I’m making sense. And, if my post goes a little off track because of not holding a good grip over the basic concept - any helpful suggestion or advice is very much welcome from respected IPUMS community !

My gut feeling is that , since I’d like to track that unique id from beginning , so my desired variables should be : mish==8 and mish==1.

We’ve already corresponded about this by email, but I wanted to also reply to your post in case others have a similar question. While we don’t generally provide individual code review, since you’ve modified code provided by IPUMS CPS I’ll give the following advice:

From what I read, this is what your code is doing:

gen time = ym(year, month)
forvalues t=372(1)680{
keep if (time== t' & mish==8)|(time== t’-16 & mish==4)
order cpsidp year month
sort cpsidp year month
by cpsidp: gen obs=_N
by cpsidp: drop if obs!=2

  1. Loop through values of time from 2005 to 2020. Since you used ym(), time actually gives the number of months since January 1960. So your loop won’t cover any of the months of the CPS.

  2. The keep command as written will drop out all the observations in the data. After the first loop, it will keep only those with time values of 2005 or 1993 (2005 minus 12). After the second loop, those records will also be deleted.

  3. Generate obs , which gives the number of times a given cpsidp appears in the data

  4. Drop all individuals who appear in the data only 1 time. (I’m assuming you’re using only ASEC data for this. In that case this will keep anyone who has two years of ASEC, which is what you want. If using basic monthly data, this will drop anyone who doesn’t appear exactly twice in the data.)

Overall, you have at least two errors in the code which will result in dropping all the observations.

Here are some other resources that may be helpful in understanding how to use the CPS longitudinally: