I was warned that using two consecutive years of MORG data from NBER would result in double-counting of households because of the nature of the rotation and the fact that the MORG extracts contain all months in a year. Is it correct that when I use IPUMS online analysis (SDA) and select ORG data, there is no such problem because it is only the month of March provided for each year? So can I build up my sample size by combining several years of ORG data on the online analysis without any (double-counting) problem?
The online analysis system in IPUMS CPS only allows for analysis using the ASEC samples. Therefore, individuals can be in this combined set of samples up to two times. If you restrict your sample to only those in the outgoing rotation group (i.e., the observation has MISH 4 or 8), then individuals could still be included in the combined sample twice. For example, if an individual has MISH==4 in March of a given year, then they will have MISH==8 in March of the next year. This is because the CPS follows a 4-8-4 sampling design where individuals are included in the sample for 4 months, excluded for 8, and then included again for 4 more months. So, in order to prevent double-counting of individuals, you’ll want to restrict your sample to either only MISH==4 or MISH==8.
Replying this year from years ago because it is almost answering what I am searching for!
The user above is tracking every March, but how do we control for the double-counting of individuals if we are downloading and using monthly (12 months over multiple years) data?
Thanks in advance!
If you pool adjacent months of the CPS, you will definitely capture individuals more than once given the rotation pattern. I am not aware of explicit guidance on this issue; unless you are linking the CPS to leverage the panel component, the data are typically treated as a repeated cross-section. You might consult the literature in your field or where people are using pooled basic monthly CPS data to see how others handle this issue or talk about it. Another idea is to restrict your analysis in a way that omits repeated observations (e.g., MISH == 1) and compare to the results when you treat this as a repeated cross-section. Finally, I am linking to a related post that includes some ideas from a colleague of mine for thinking about standard errors in this situation.