I have extracted EARNWEEK in a sample of data from 2012-2016. I am using R to read in the data, my read_fwf command is as follows:
raw <- read_fwf(
file = paste0(input, “/cps_00005.dat”),
fwf_widths(c(4, 5, 10, 14, 2, 2, 1, 1, 4, 5, 4, 2, 2, 14, 10, 10, 1, 4,
2, 1, 3, 1, 3, 2, 2, 4, 4, 2, 3, 3, 4, 1, 7, 8, 2, 1, 1, 1),
c(“year”, “serial”, “hwtsupp”, “cpsid”, “region”, “statefip”,
“asecflag”, “hflag”, “metarea”, “county”, “cpi99”, “month”,
“pernum”, “cpsidp”, “wtsupp”, “earnwt”, “nchild”, “relate”,
“age”, “sex”, “race”, “marst”, “hispan”, “educ99”,
“empstat”, “occ”, “ind”, “classwkr”, “uhrsworkt”, “uhrswork1”,
“hourwage”,“union”, “incwage”, “earnweek”, “wkstat”,
“qempstat”, “qocc”, “qearnwee”)),
col_types = “iciciiiicciicciiiiiiiiciiiiiiiiiiiiiii”)
Documentation for EARNWEEK reads as follows:
“The values in EARNWEEK are in dollars, with no implied decimal places; a value of 500 means that the respondent earned five hundred dollars per week before deduction.”
However, missing values are coded with two decimal points:
Codes
9999.99 = N.I.U. (Not in Universe).
Top codes:
1990-1997: 1923 (Weekly earnings of $1923 or more).
1998-onward: 2885 (Weekly earnings of $2885 or more: ASEC samples only). 2884.61 for non-ASEC samples.
Within R I estimate the minimum value of raw$earnweek as 100 and the maximum value as 999999. The second highest value appears to be the top code with two implied decimals: 288500. Am I missing something or does the documentation need to be corrected? Thanks for your time and all that you do!
Best,
Lowell