FAILURE: Linking to the Same Calendar Month across Two Consecutive Years from JULY 1994 to AUGUST 1995

I am trying to link CPSIDP across years to see EARNWEEK.

To do so, I retrieved data from 1990 to 2015 (I consider EARNWEEK so I only consider MISH==4 and MISH==8. At the moment I both have ASEC and Basic Monthly Survey).

gen time = ym(year, month)

forvalues t=372(1)680{

keep if (time==t' & mish==8)|(time==t’-12 & mish==4)

order cpsidp year month

sort cpsidp year month

by cpsidp: gen obs=_N

by cpsidp: drop if obs!=2

}

Now, before even I start to verify the sanity of matching using cpsidp, I get 0 matched observations from July 1994 to August 1995; i.e., I cannot link CPSIDP in July 1994 with July 1993, August 1994 with August 1993, … , and August 1995 with August 1994.

Is this something expected? It seems like Drew, Flood, and Warren (2014) Table 5 successfully matched responses using CPSIDP in March 1995 to March 1994, which I cannot get.

Actually, this is a known and expected limitation. Page 126 of the article by Drew et al. discusses this (and other) known limitations to linking. Specifically, changes in numbering schemes for housing units prevent linking based on household identification across some pairs of years, including 1984 to 1985, 1985 to 1986, 1994 to 1995, and 1995 to 1996.

Thank you so much Jeff!

I will go on by saying that using CPSIDP, I cannnot link in these months (I thought the sentence you mentioned was pointing out that it was impossible to link individuals using CPSID, but CPSIDP was possible).

I have a back up question as well; with the code provided above, it WAS possible to link CPSIDP after August 1995 vs August 1995 and so on. Then, are they wrongly linked until 1996?

p.s. Happy new year!

If records are linking using CPSIDP, then you can presume that these are valid links. Of course, you can verify these links by using the methods described in this presentation.

Happy New Year!

I hope this is alright to dig up a post from so long ago but may I kindly ask what this 372 and 680 stands for in the first line of this command [ forvalues t=372(1)680{ ]

According to my knowledge, in case of yearly panel it looks like the following forvalues t=2005(1)2020{ , if the sample is from year 2005 to 2020. Is it different when someone is working with basic monthly sample of CPS ?

The 372 and 680 are in Stata’s internal date format, which records the number of months since January 1960. Type:

display tm(1991m1)

in the Stata terminal and you’ll see the result is 372.

1 Like

I don’t understand why 372 and 680 would be part of the original code. @jplee retrieved data from 1990 to 2015 would it not be forvalues t=360(1)671 ? That would be 1990m1 and 2015 m12.

I don’t remember what I did in the past, but probably I provided an example code so there might be discrepancy between the year and month I actually used and the code. Please use the date format Stata returns as suggested by @Matthew_Bombyk

1 Like