I am having trouble locating something that I have found on the IPUMS site in the past, information about young children in the sample for the first few years of IPUMS-CPS micro data. I know that prior to some early date (1966?) data on children under some age (14?) was not collected. Can you tell me where that is documented?
I am also puzzled by some places that this information isn’t. For instance, NUMPREC is described as fully comparable across years. Is this an error? If not, what is the rationale for comparability between years that do and don’t count children?
Finally, I would be extremely interested if you knew of any study by anyone ever to impute children to households for these early years. I believe such imputation is necessary if we are to do adult-equivalent equal-population quantiles that reach back to these early years, as is necessary for .comparability with distribution estimates for later years.
PS You guys rock! --Andrew
You are correct that children under the age of 14 are not included in the microdata for all years (excluded from ASEC prior to 1968 and from basic monthly data prior to 1982–though they are in a few supplement-containing months in the 1976-1981 time period). This is documented in the basic monthly sample notes and ASEC sample notes.
Regarding comparability notes for variables, I suspect that because this is a change in sample composition, including the information in the sample notes is more relevant than mentioning it in individual variables. I am also not aware of a variable that reports household or family size in samples that omit children (the most logical place for this information). Although NUMPREC is often the same as household size, it specifically reports the number of person records in the data file that follow a given household record. Because the children are not included in the data file, they do not have a record; therefore, I would not expect the variable to flag this as a comparability issue. Some of the linking variables (e.g., CPSIDP) direct users to this working paper on challenges creating linkages to leverage the panel component of the CPS, which discusses this omission as a comparability issue.
I am not aware of any work that imputes children in the CPS for these years.
Thanks Kari! I find your explanations pretty believable. Personally I think this sort of DRY (don’t repeat yourself) documentation makes more sense for computer code than for material intended to be used by human beings, as typical personal computers have working memory for two to eight million things, while human working memory holds four to seven. But I don’t expect anyone will pay much attention to my views on the matter.
Could you say a few words about the role of the DoL/Census in preparing this documentation to the role if IPUMS? Does IPUMS pretty much just pass on the documentation it gets, or does it engage in some summary or reorganization? If the latter, is there any way I can tell if the text of some documentation is Census/DoL or IPUMS? There are places where I should direct questions to the actual documentation preparer, I think. For example, looking at the within-ASEC-file documentation of the levels of different variables, there are some consistent patterns which appear to divide the variables into groups with somewhat differently formatted presentations. I have not been able to tell if these are conceptual or just historical accidents. I suspect that if I want a reliable answer I need not just a Census person, but some particular Census person.
Our documentation is based on what we can glean from the original documentation and the data. A large part of what we do is associate data provider documentation with a variable (as opposed to the typical tome of documentation that accompanies a single data file, but makes it hard to track a variable over time as you need to review each year’s documentation). That being said, we also work to synthesize available information into summary guides and link to original CPS documentation (see the Original Codebooks and Current Population Reports and Technical Papers on the previously linked summary guides page). If you have questions about specific variables or pieces of documentation, I am happy to share the information I have, and flag any questions that are best directed to the Census Bureau.
I asked this question 4 years ago, and am still interested in an updated answer,especially from recent research.I am also expanding the scope of my inquiry somewhat. I would actually like to find any paper based on any Federal survey from any time that develops a credible methodology to impute children to households or mothers for any year when that information is missing. I can force my results to be consistent with Census data (using the RAS algorithm for iterative proportional fitting) but I would be much happier if I had some way of assuring that these imputations are not just hallucinations of mine, or phlogiston statistics. I believe such imputation is necessary if we are to do adult-equivalent equal-population quantiles that reach back to the early years of CPS/ASEC, as is necessary for .comparability with distribution estimates for later years.
P.S. You guys rock! --Andrew
P.P.S I also want to know if there is any easy way to find all the data quality indicators. in light of recent research showing that hot deck and other widely used imputation procedures underestimate income relative to administrative records by about 5%, especially as failure to answer income questions in the CPS/ASEC is greatest among the very highest income brackets.
This type of imputation is not a topic that I’m familiar with and I’m not aware of any methods currently in use. If you pursue the IPF algorithm you suggest, I encourage you to review the ASEC variables for the 1962-1967 samples (those without children under 14) to identify which concepts are most important. These might include marital status (MARST), the presence of children in the household aged 14+, or variables that may capture childcare demands (e.g., MAJORACT or WHYPTLWK). Note that IPUMS CPS does not have any data from years prior to 1962.
Depending on your research question, you may be interested in using decennial census data from IPUMS USA. While you lose the temporal precision of the annual CPS data, there is a significant overlap in the topical coverage. This includes employment and earnings data in the IPUMS USA variables EMPSTAT (employment status), INCWAGE (wage income), and FTOTINC (total family income). I will also note that we have annual county-level natality data available through IPUMS NHGIS, which may help you address the decreased temporal precision in the decennial data.
Regarding data quality flags, IPUMS CPS provides these data quality indicators as variable flags. You can view each variable’s associated flag by opening the Flags tab on the associated variable’s documentation page (e.g., INCWAGE) and clicking on the link. Since most flag variables start with the letter Q, a way to search for all of them is to use the alphabetical drop-down menu from the variable selection menu and look at all Q variables. Flags can also be added in bulk from the extract request page by clicking the Select Data Quality Flags button. Note that flags are not available for all variables and may be available only in certain samples for other variables.