# Is there an issue with the WTFINL variable in the CPS monthly data from 2000-2004?

When I try to calculate the proportion of males by state and fiscal year quarters for people older than 17, I see a greater variation in the estimates from 2000-2004 that goes away in 2005-2014. I am aware of the issue with the WTFINL variable in April and June 2001 (CPS Small Sample Size April and June 2001: using stata [pweight=wtfinl] - sudden spike in educational attainment) but am not sure that is the only issue affecting my results. Code below generates what I am seeing. Need age, gender, hispanic, year, month, and wtfinl

* setwd

* FY Quarter model

* Quarter 1 is Oct, Nov, and Dec

gen quarter = 0

* Quarter 2 is Jan, Feb, and Mar

replace quarter = 1/4 if month == 1 | month == 2 | month == 3

* Quarter 3 is Apr, May, and June

replace quarter = 2/4 if month == 4 | month == 5 | month == 6

* Quarter 4 is July, Aug, and Sep

replace quarter = 3/4 if month == 7 | month == 8 | month == 9

gen year_quarter = year + quarter

replace year_quarter = year_quarter+1 if quarter ==0

drop if age < 17

gen population = 1

gen male = sex==2

gen hispanic = hispan != 0

keep male black hispanic population wtfinl statefip year_quarter

collapse (mean) male hispanic (count) population [pweight = wtfinl], by(statefip year_quarter)

* Most prominent in proportion of males

twoway (line male year_quarter), by(statefip)

* To a lesser, but noticeably, extent proportion hispanic

twoway (line hispanic year_quarter), by(statefip)

* Sharp dip in population circa 2001. Most noticeable in California

twoway (line population year_quarter), by(statefip)

I was able to replicate the additional variability pre-2005; however, this does not appear to be an issue with the WTFINL variable. While the CPS can be used for state-level analyses, researchers should still proceed with caution and be aware of the large standard errors due to small sample sizes (especially with sub-populations such as male or Hispanic). Additionally, the Census Bureau implemented improvements to the second-stage weighting and composite weighting procedures. These improvements are known to increase the stability of state demographic estimates across time. These changes could also be responsible for the lower variance you are seeing in the more recent samples.

Also note that males correspond to SEX=1. It appears you mistakenly coded males as respondents with SEX=2 in your sample code.

Hope this helps.