Fertility rate using YNGCH

Hi!
I am trying to use the ACS to compute fertility data in the US. I’d like to later look at some geographical splits so I was attracted by the large sample size.

I know the ACS has a fertyr variable, but since I also wanted to look at before 2005, I downloaded the decennial census from 1960 too. So I am now trying to construct the TFR for the US using this data and, given that I do not have access to fertyr for the decennial, I thought of just using a dummy=1 if YNGCH==0, that is, if the age of the youngest child is 0.
In my head, any bias in this computation would have been upwards , given that, it’s possible that people had a child in last calendar year at the time of the questionnaire but the age is still 0.
However, when I compute the TFR with this data, I seem to be getting my numbers to be significnatly lower than what I have seen as official numbers. For example for 2022, I get a TFR around 1.5, while I think it’s supposed to be a bit above 1.7. If I do it using fertyr, I get the numbers I expected (at least broadly), but I don’t have the older years.

I am trying to understand if this is a data problem, or a code problem. Here’s an example of my R code for the computation:

filtered_data ← data_copy %>%
filter(age >= 15 & age <= 49 & sex==2) %>% # Restrict to reproductive ages
mutate(age_group = cut(age, breaks = seq(15, 50, by = 5), right = FALSE, include.lowest = TRUE))

asfr_data ← filtered_data %>%
group_by(year, age_group) %>%
summarise(
births = sum((yngch==0) * perwt, na.rm = TRUE), # Births to women in this age group
population = sum(perwt, na.rm = TRUE), # Total women in this age group
asfr = births / population, # Age-Specific Fertility Rate
.groups = “drop”
)

tfr_data ← asfr_data %>%
group_by(year) %>%
summarise(
tfr = sum(asfr* 5) , # Sum ASFRs and multiply by the interval (5 years)
.groups = “drop”
)

Any help would be appreciated! Thanks.

The census and ACS include information on each member of each household included in the sample—essentially a household roster. Household rosters are then used to construct variables about families and relationships, such as YNGCH. You can use this information on household members to indirectly estimate fertility, however, these estimates are unlikely to exactly match official fertility rate estimates derived from birth data.

The variable YNGCH reports the age of the youngest own child (if any) residing with each individual. This variable is not reported directly in the ACS, but is constructed by IPUMS. We use our own algorithms to identify the most likely parent-child relationships in each household, and link parents to children (see the variables MOMLOC and POPLOC). YNGCH is created based on these links. The parent-child links we create identify non-biological (e.g., step or adoptive) relationships between parents and children, as well as biological relationships. For this reason, and because YNGCH is not derived from self-reported information about family interrelationships, YNGCH does not measure fertility exactly.

Since children do not always live with their biological mother, using household roster data to calculate age-specific fertility rates that are then used to compute the total fertility rate, is an imperfect method. Imagine, for instance, a 16-year-old gives birth to a baby, which is adopted by a 30-year-old. These types of scenarios can cause bias in age-specific fertility rates, both for the 16-year-olds and for the 30-year-olds, and therefore the total fertility rate. This is not an issue when working with natality data, which include information on biological parents. Additionally, the fertility rate measures the number of live births per woman. Infant mortality will bias your estimate downward, since some infants who survived birth did not survive long enough to be observed in a survey dataset. This bias will be different for different groups of women as well, based on maternal age, race, socioeconomic status, geography, etc., since infant mortality rates differ by group.

Young children are undercounted in the decennial census and the ACS. This undercount is likely contributing to your underestimate of the total fertility rate. Note also that your method excludes twins, triplets, and other multiples, since you are measuring whether or not someone has an infant, rather than how many infants they have. This will further downward bias your estimate.

Hi Isabel, thanks so much for you very precise answer. It seems than that, to look at historical fertility, I will need to resort to other data.
Thanks!