2014 WTSUPP appears to be doubled

When I sum over WTSUPP and EARNWT over several years, I notice that the WTSUPP for 2014 appears to be doubled. See results below (year, sum(EARNWT), sum(WTSUPP), number of rows):

r year sum(earn) sum(wt) row count

3 2012 242604455 308827259 201398

4 2013 244994898 311116170 202634

5 2014 247257842 626838616 199556

6 2015 250080007 316167949 199024

Was there something unique in 2014 that would account for this. I have downloaded several extracts and seem to be coming across the same issue. The number of rows seems to be normal.

For the 2014 ASEC sample, the Census Bureau divided respondents into a 3/8ths and 5/8ths group, with the smaller group receiving redesigned income questions. WTSUPP is assigned to records so that each file is independently representative of the US population. If both parts of the file are analyzed together and weighted with WTSUPP, the estimate ends up being twice the US population. The HFLAG variable can be used to differentiate between the two groups. Please also see our User Note on the 3/8 file redesign.

Hope this helps.

How should we proceed then to obtain a sample that is comparable with previous years? Use only the 5/8 of the sample tho whom only the usual questions were asked? Isn’t there another set of weights that allows to take the whole sample into account without the doubling of weights?

The Census Bureau created completely separate weights for the 3/8ths sample and the 5/8ths sample. Ultimately, the decision of whether to use the 5/8ths sample and/or the 3/8ths sample is up to the researcher. If you are using income variables in your analysis, then the 5/8ths sample will be more comparable to pre-2014 ASEC samples (see this paper for a discussion of how income was affected by the redesigned questions). As for pooling the samples, we have not investigated enough on our end to be able to make a recommendation for how to adjust the weights. Since both samples are independently nationally-representative, it seems reasonable to divide WTSUPP by 2. You could also choose to “weight” the WTSUPP values for the 3/8th samples by 3/8ths and the 5/8ths sample by 5/8ths, under the assumption that the larger sample is more accurate. However, I recommend that you contact the Census Bureau directly for further guidance.

Hope this helps.

hi. in the extract i downloaded (HHINCOME plus associated variables), i see 199556 observations for 2014, and the numbers 3/8 and 5/8 appear to be approximations; 0.3 and 0.7 appear to be closer to the mark. did i make a mistake, or …?

x <- gzfile("./ipums/cps_00006.csv.gz", open=“r”)

> ddset <- read.csv(x, header=TRUE)

Warning messages:

1: In read.table(file = file, header = header, sep = sep, quote = quote, :

seek on a gzfile connection returned an internal error

2: In read.table(file = file, header = header, sep = sep, quote = quote, :

seek on a gzfile connection returned an internal error

> nrow(ddset)

[1] 8863262

> sum(ddset$YEAR==2014)

[1] 199556

> sum(ddset$YEAR==2014 & ddset$HFLAG==1)

[1] 60141

> 60141/199556

[1] 0.3013741

> sum(ddset$YEAR==2014 & ddset$HFLAG==0)

[1] 139415

> 139415/199556

[1] 0.6986259


You haven’t done anything wrong. The 3/8ths and 5/8ths designations are approximate figures. This presentation by the U.S. Census Bureau highlights some key details about the 2014 split sample ASEC.