I’m trying to use monthly CPS data to determine the number of children under 5, with all parents in the labor force, and with family incomes below various levels.
For starters, I tried to reproduce the Census estimate for number of children under 5 in Hawaii (the state I’m focusing on) each year. According to the census, there were 82,785 children ages 0 to 4 in 2021. However, the output of the R code below is 47,324. The method I used is as follows:
Start with monthly CPS data for Hawaii.
Filter to keep one row per household in each year-month combination (where a “household” is all rows with the same values of HRHHID and HRHHID2).
For each row, multiply number of children under 5 (NCHLT5) by WTFINL, sum these values and divide by 12 (since we’re using 12 months of data for each YEAR).
I’m assuming I’m doing something wrong, since the value I calculated is so far from the actual estimate, and hope to learn how to do this calculation correctly. In addition, I’m also interested in learning how to properly determine if all parents in a given family are in either the civilian labor force or armed forces, and how to count the number of children under 5 in those families.
# IPUMS CPS monthly data for Hawaii
d %>%
group_by(HRHHID, HRHHID2, YEAR, MONTH) %>%
slice(1) %>%
ungroup %>%
group_by(YEAR) %>%
summarize(
total.children.under5 = sum(NCHLT5*WTFINL)/12
)
It sounds like you’re interested in the total number of children across households at this stage rather than the number of children individual household members have. NCHLT5 is useful in analyzing the latter (e.g. finding the number of people with at least one child under 5), but since different household members will have varying responses for NCHLT5, it’s not particularly helpful in estimating the former. Instead, I recommend using the variable AGE to identify records of children under 5. While this variable won’t tell you who the parent is or how many children they have, summing the number of observations across the entire sample for which AGE < 5 will give you a consistent unweighted count of the total number of children.
More specifically when you produce your weighted estimate, instead of filtering to keep one row per household, you’ll want to filter to keep every row where AGE < 5. Since you’re generating point estimates from 12 months of pooled data, you should then divide WTFINL for each observation by 12. Now you can finally sum WTFINL for all of the remaining observations, giving you the weighted total number of children. You may also be interested in the survey package and the function surveysummary, which allows you to produce weighted estimates from survey data without needing to sum the weights yourself. However, you will still need to divide WTFINL by the number of months that you pool. Note that for the most accurate comparison with Census 2021 (American Community Survey) estimates, you’ll want to use the January-December 2021 CPS. Also, keep in mind that the CPS is a survey of the non-institutionalized US population whereas the ACS has no such restrictions.
I mentioned that AGE won’t tell you which record represents the parent of the child under 5, but you can use the attach characteristics tool to add variables in your data cart for both the child’s father and mother, such as LABFORCE and INCTOT, to the child’s record. You can then choose to not only count the total number of children under 5, but also filter by children whose mother/father is in the labor force and/or their total personal income. This tool is available on the extract request page once you’ve opened your data cart and selected to create your data extract. Note that the tool uses the IPUMS constructed variables MOMLOC and POPLOC to connect records. If you plan to use these, I strongly recommend reviewing the description and comparability tabs for these variables as well as the original documentation for family interrelationships and documentation on the updated pointers.