Hi!
I am trying to use the ACS to compute fertility data in the US. I’d like to later look at some geographical splits so I was attracted by the large sample size.
I know the ACS has a fertyr variable, but since I also wanted to look at before 2005, I downloaded the decennial census from 1960 too. So I am now trying to construct the TFR for the US using this data and, given that I do not have access to fertyr for the decennial, I thought of just using a dummy=1 if YNGCH==0, that is, if the age of the youngest child is 0.
In my head, any bias in this computation would have been upwards , given that, it’s possible that people had a child in last calendar year at the time of the questionnaire but the age is still 0.
However, when I compute the TFR with this data, I seem to be getting my numbers to be significnatly lower than what I have seen as official numbers. For example for 2022, I get a TFR around 1.5, while I think it’s supposed to be a bit above 1.7. If I do it using fertyr, I get the numbers I expected (at least broadly), but I don’t have the older years.
I am trying to understand if this is a data problem, or a code problem. Here’s an example of my R code for the computation:
filtered_data ← data_copy %>%
filter(age >= 15 & age <= 49 & sex==2) %>% # Restrict to reproductive ages
mutate(age_group = cut(age, breaks = seq(15, 50, by = 5), right = FALSE, include.lowest = TRUE))
asfr_data ← filtered_data %>%
group_by(year, age_group) %>%
summarise(
births = sum((yngch==0) * perwt, na.rm = TRUE), # Births to women in this age group
population = sum(perwt, na.rm = TRUE), # Total women in this age group
asfr = births / population, # Age-Specific Fertility Rate
.groups = “drop”
)
tfr_data ← asfr_data %>%
group_by(year) %>%
summarise(
tfr = sum(asfr* 5) , # Sum ASFRs and multiply by the interval (5 years)
.groups = “drop”
)
Any help would be appreciated! Thanks.