Inconsistency with some of the CPS observations

Hi.

If I understand correctly, the first 6 digits of the IPUMS variable CPSIDP is the year and month that a particular household was first in the CPS. And if I was to create a calendar date based on that year and month and add (mish-1) if mish <=4 or (mish+7) if mish>4, then I should get the same date provided by the variables month and year.

For the most part, this is what happens. Here is an example when it is correct:
|year |month |mish|cpsidp | surdate |caldate|
|2000|january|eight|19981000005401|2000m1 |2000m1|

Where caldate is the calendar date based on the year and month vaariable and surdate was calculated as the date of the first 6 digits of cpsidp +mish-7 (see above). Notice surdate is the same as caldate in this example.

And here is an example of when there is an error:
|year |month|mish|cpsidp |surdate|caldate |
|2021|july |five |20210706771601|2022m7|2021m7|

Notice how surdate (calculated as the date from cpsidp + mish +7) is one year ahead of caldate. Mish in this latter case should be one and not 5.

In an extract consisting of all Basic Monthly CPS surveys from January of 2000 through July of 2021, I noticed that over 6 percent of the observations were subject to this errror. Here is the breakdown:


zz |      Freq.     Percent        Cum.

------------±----------------------------------
0 | 31,770,795 93.84 93.84
1 | 160,873 0.48 94.32
2 | 110,930 0.33 94.64
3 | 90,260 0.27 94.91
12 | 1,529,077 4.52 99.43
13 | 121,941 0.36 99.79
14 | 49,177 0.15 99.93
15 | 23,082 0.07 100.00
------------±----------------------------------
Total | 33,856,135 100.00

zz is the difference between surdate and caldate.

My question, in order to fix this discrepancy, is what was miscoded? MISH or CPSIDP?

Thank you.

–Stuart M. Glosser
Professor of Economics (Emeritus)
University of Wisconsin at Whitewater

After consulting with our CPS team, we think that this is not an issue with MISH or CPSIDP. Rather, it’s a result of MISH being a household-level variable and CPSIDP being on the individual-level. In your second example, the individual does first appear in the July 2021 survey. However, MISH five indicates that July 2021 is also the fifth month-in-sample that occupants in this residence were interviewed. That means that the residence was first surveyed in July 2020, but the first time this individual appears in it is in July 2021. There are a few reasons for such a finding. One possibility is that the individual moved to that residence sometime between November 2020 and June 2021. It could be that they moved in with family already living at the residence. One way to check for this is if the first 6 digits of CPSID and CPSIDP for this observation differ. It could also be that the entire previous household moved out and a new household moved in.

Hope this is helpful and please let me know if you have any further questions,