I was calculating the monthly total for WTFINL and PANLWT,
% sum using panel weight
egen panl_sum = total(panlwt), by(date sex)
% sum using cross-sectional weight
egen cross_sum = total(wtfinl) if age >= 16 & popstat==1, by(date sex)
I am doing so for 1996-2023. For the period between January 2000 and December 2002, the difference between those two sums is not insignificant anymore, mounting up to almost two million.
The graph below plots this issue for both sexes.
collapse (max) cross = cross_sum panl = panl_sum, by(date sex)
gen check = panl - cross
twoway (line check date if sex==1, sort) (scatter check date if sex==2, sort)
I have not found any information on that here in the forum or in the documentation of the variables WTFINL and PANLWT. Is this a commonly known issue and is there a workaround for that?
Any help or hint would be highly appreciated.
Sums of WTFINL and PANLWT generally yield very similar counts, with the exception of the 2000-2002 BMS (exclusive of the April and June 2001 BMS). From the WTFINL comparability information:
“After the original release of the 2000-2002 CPS public use files, the Census Bureau released a set of updated weights for these samples that incorporate updated population counts from the 2000 Census. These revised weights are made available as part of WTFINL for these years except for the April and June files from 2001. Due to the inclusion of an oversample in these months, the revised weights yield improbable totals. If users wish to apply the revised weights to these months, they are available in UH_WGT_B2.”
Based on my exploration of the Census Bureau documentation and conversations with IPUMS colleagues, it seems that the Census Bureau did not release an updated version of PANLWT along with the updated version of WTFINL. What this means is that for the 2000-2002 BMS (except April and June of 2001), PANLWT and WTFINL are based on population estimates from different census data—PANLWT is based on 1990 data, while WTFINL is based on 2000 data. If it’s important to you to have the two weights match, you can use the original version of WTFINL, UH_ZWGT_B1, which is based on the 1990 decennial census. Otherwise, just being aware of why the estimates using these two weights differs in 2000-2002, you can continue to use PANLWT and WTFINL as is.
Thanks very much to you and your colleagues! I read this information on WTFINL. I was aware that the underlying census changed, but my understanding was that a revision would also include a revision of all depending weights.
Thankfully I did not stumble upon a major issue.
Have a nice weekend!