CPS March 2001 SCHIP Expansion, and other questions

Hello all,

I was wondering if there was some sort of flag to discern observations from the original March Annual Demographic File (ASEC) and the SCHIP expansion on the IPUMS extracted ASEC. On the NBER website you can download the 2001 original sample, or the 2001 SCHIP file available for public use 2002 and onwards (Current Population Survey (CPS) Supplements: Annual Demographic File | NBER). I am currently using IPUMS ASEC from 1979 to 2022 and it would be VERY unideal for me to either inject the 2001 file after harmonizing or switch over entirely JUST to exclude the SCHIP file. Maybe there’s something else I’m not seeing/understanding so any help would be greatly appreciated!

Another question I have is in regards to a variable also only available in the original march supplement (to my knowledge), H_UNDER18 (https://www2.census.gov/programs-surveys/cps/datasets/2022/march/asec2022_ddl_pub_full.pdf), which counts the number of person under 18 in the household. Last I checked, manually calculating it in STATA -bys serial: egen h_under18 = total(age<18)- does not match the original variable. Why is this the case, or is there a better way/am I doing this wrong?


There is no single variable that will identify respondents who are part of the ASEC SCHIP oversample. Observations in either of the two ASEC oversamples (Hispanic and SCHIP) are identified with the variable ASECOVERP, which assigns observations to the oversample if a household was included in the ASEC, but not administered the March Basic Monthly CPS. The variable is not always perfect in identifying these respondents; between 1976-1988 it is sometimes impossible to distinguish true oversample records from those which IPUMS is unable to link between the March Basic Monthly and ASEC files with confidence (for more information, see Flood et al., 2020). Observations in the Hispanic oversample also cannot be easily distinguished from SCHIP households; the latter includes Hispanic individuals and the former includes households with young children. However, dropping all observations with ASECOVERP = 1 will create a sample that excludes observations from both oversamples.

Alternatively, you might try to leverage the method used to assign respondents into the oversample. Flood & Pacas (2017) identify “split-path” assignment and month-in-sample 9 (MIS 9) assignment as the two methods used to draw respondents into the SCHIP oversample. Reverse engineering these methods to identify respondents who were included into the ASEC as part of the SCHIP oversample requires linking their ASEC oversample responses to responses from other months in the panel. This can be very tedious work and in some cases it might not even be possible to do. One case where this is possible though is outlined in the attached image. ASEC SCHIP oversample respondents can be linked to their ASEC record in the subsequent year if they were part of the split-path sample such that either the month before the ASEC (i.e. Feb) was their fourth month-in-sample (MISH) or the month after the ASEC (i.e. April) was their first month-in-sample. This is the extent of my knowledge on the matter, but I hope these resources are still helpful for your analysis.

Regarding your second question, HUNDER18 appears in IPUMS as the unharmonized variable UH_HH18_A1. A few notes in the variable description state that “comparisons of this variable with a created count variable (count if 0 lte age lte 17) indicate that hh18 values are too high in some years (2 percent of the cases in 1990); in some, the hh18 value is higher than total count of people in the household (numper).” The IPUMS variable AGE comes from a different person-level variable and therefore may not exactly match data from HUNDER18. Moreover, I noticed in your code that you did your summation within SERIAL. However, SERIAL is only unique to households within a given survey month and year. If you have multiple months and years within your sample, you will need to include MONTH and YEAR in your bys command.