Longitudinal Sampling Bias?

In the documentation explaining how to link CPS respondents over time, I found the following claim: “if the occupants of a residence move out, they are replaced in the sample by the new people who move in. The prior occupants of the residence are no longer included in the CPS” (Drew, Flood, and Warren 2014). I am interpreting this as meaning that if a household is sampled say for four months and then moves sometime during the next eight months, it will not be generally be included in the CPS for the second sampling of four months (MISH in {5, 6, 7, 8}). But “no longer included” means to me that their responses from the first four months of sampling are still included; it is just that the CPS does not track the household to the new residence. This strikes me as a source of non-random sampling in the longitudinal aspect of the CPS, since we only observe MISH > 1 for non-movers. Am I misinterpreting what this documentation is saying, or is this correct and are there weights that account for this sampling non-randomness?

I apologize for asking this question here if it has an easy answer posted somewhere, but I have been unable to find additional discussion of this topic even in the documentation cited.

Thank you!

The CPS follows households, not people. Extrapolating on your example, if occupants (family A) move out of a household in the 8 months between MISH4 and MISH 5, and new occupants (family B) move in, the household will remain in the CPS, but will report on family A for the first four months (MISH 1-4) and family B for the second four months (MISH 5-8). IPUMS CPS has constructed a number of weights for leveraging the panel component of the CPS: LNKFM1MWT, LNKFW1YWT, LNKFW8WT, LNKFWMIS14WT, and LNKFWMIS58WT. More information is available about linking from our 2018 workshop on leveraging the panel component of the CPS; you may find the presentation on weights of particular interest.

Hi Kari,

EDIT: mish = 0 should be mish = 1, mish != 0 should be mish != 1, and mish > 0 should be mish > 1. See later comment.

Thanks for clarifying how movers are replaced in the sample! It is less clear to me now whether there is the sampling bias I worried about. I did, however, try to compute whether cross-state migration rates from the March ASEC (asecflag = 1, asecoverp = 0) for the mish = 0 group and the mish != 0 group differ from each other. They should not be statistically different if the sampling does not differ along this margin. This calculation can be performed just by regressing an indicator for whether migrate1 = 5 on an indicator for whether mish = 0, including a constant, and weighting by asecwth. This gives me an overall rate (the constant) of 0.0239, and a statistically significant difference between the mish = 0 and mish != 0 groups of 0.0026 (t = 6.96). This seems like a non-trivial difference of more than 10% of the base migration rate. Do you know what is going on here? In other words, is the correct takeaway from this regression that cross-state movers are being undersampled in the mish > 0 group, or am I still missing something about the sampling process?

Thanks again!

I am not quite sure I understand your question. There are no 0 values for MISH; they should be between 1 and 8 only. Based on this I am not sure how to interpret a comparison of MISH = 0 and MISH > 0 groups. It seems reasonable to me to wonder if people who moved in the past year (as measured by MIGRATE1) remain in the household for all 16 months (8 observations) at different rates than non-movers; there are certainly ways of testing this (e.g., linking individuals over time, validating those linkages and creating a maximum MISH value for valid links, and then comparing MIGRATE1 values). As a final note, MIGRATE1 is a person-level variable, but ASECWTH is a household-level weight.

Hi Kari,

Sorry for the typo. I am comparing the mish = 1 group to the mish > 1 group. My thinking is that the cross-state migration rates I calculate for these two groups should be the same (statistically indistinguishable) if the sampling process does not differ between them. Mish = 1 is a natural benchmark because these entries are unaffected by the way that CPS deals with movers in the sample as you described above.

But the statistics differ across these two groups, as described in my previous comment. I weighted the regression by asecwt instead of asecwth (thanks for pointing that out), but the results change very minimally in magnitude and the overall conclusion is the same.

So I am curious if there is an undersampling of the cross-state movers in the mish > 1 group relative to the mish = 1 group or if there is still something I am missing about the sampling process.


Thanks for the clarification.

I will comment on the composition of an analytical sample-the actual sampling of individuals doesn’t change (e.g., the same households are included–which households are included in subsequent interviews is not driven by sampling). Also, I will leave interpretation of your regression results up to you, but will note that a value of 2-8 in MISH doesn’t necessarily mean that household members were present for the first month in sample (MISH = 1). MISH values greater than 1 simply indicate that a household has already been interviewed once already. MISH is part of leveraging the panel component of the CPS, but I would caution against using this without other information about linking.

All of that being said, I would expect anyone who joins (or replaces) a household in MISH 2 or later to have moved in the past year (though not necessarily from outside of the state), and it seems reasonable to me that the sample composition for people who can be linked (and whose links appear valid) across multiple interviews might look different from those who cannot be linked after the first month.

CPS does suffer from a variety of seam and cohort effects when the samples do not match exactly. (Blame the people, not the statisticians – it’s the respondents who introduce these oddities. These effects drive the methodologists nuts.) Think about this: the CPS interview in the first month is face-to-face; the interviews in subsequent months are mostly on the phone (I think it’s about 85% but a CPS data geek can correct me). If a family moves out, and the interviewer calls their number in month 13 (MISH == 5) after 8 months break, one of the following things happen:

  • this is a cell number, and they report as having moved, so they are no longer eligible because CPS is the sample of residences, and they are not in the residence that is supposed to be interviewed.
  • this is a landline number, and it got disconnected because the family moved out.
  • this is a landline number that somehow stayed (in rented housing where the landlord pays for the line; weird but who knows). The person who picks up the phone says, “why are you saying I filled the survey a year ago? I did not. Go away” (expletives omitted).

Any of these would necessitate a personal visit by a field interviewer, and even then, the unit may not respond. The CPS statisticians would have to reweight the units remaining in the sample to compensate for that, but migration is unlikely to be a weighting variable. (Again, a CPS expert can correct me here; I would not try to weight to something that is only exhibited by 2% of the sample unless I am really, really desperate.) I am not aware of any generally good source of administrative data for migration (which is a statement about my knowledge). Various tracking systems do exist, e.g. LexisNexis or something like that, but people who have minimal interaction with government and minimal electronic footprint, such as laborers or seasonal workers who are paid in cash and who pay in cash for everything, would likely be missed from these – and CPS does not use these anyway because tracking individuals is not within the mandate nor within the methodology limits of the survey.

I would be hard pressed to say that an effect of 0.26% is a real population effect vs. the effect of a methodology quirk like the one described above.

Thanks for your comment! My hope would be that they do reweight units remaining in the sample to account for the moving issue, but it’s unclear to me whether this occurs. I agree that this is probably not a worry if your study is using the entire sample, but the study I was working on was assessing cross-state migration rates of various wage groups. Since wages don’t get reported until someone is in the ORG (MISH = 4 or MISH = 8), but people disappear from the CPS if they move, I became concerned about potential bias – movers drop out of the CPS, but non-movers, part of the denominator in a migration rate, do not (except for other reasons such as mortality that people drop out, but I wasn’t as concerned about differential bias among movers and non-movers for these other reasons). I became especially concerned when I found that basic cross-state migration rates (forget about wage groups) differ when you compute them conditional on different MISH values. The IRS has “population” migration data, but it does not allow for a demographic breakdown. I was not aware of the LexisNexis data and so will look into that. Thank you!