The Julia code below implements my FILESTAT adjustment algorithm for 2004. I adjusted the syntax so it contains only standard flow control statements and logical indexing. I think it should be straightforward to reproduce in other languages - I am happy to answer questions if something is not clear of course.
Also, I wrote a brief technical report on the FILESTAT discrepancies which also includes more details on the adjustment algorithm and adjusted FILESTAT values. The report and some more code are in this repository: ASEC_FILESTAT_adjustment
@jvandernaald If you generate an implementation in another language I would be happy to include it in the repository.
@Matthew_Bombyk: Obviously, adjusting the FILESTAT variable in the IPUMS database would obviate the need for adjustment by users. If this could be an option, I am ready to assist if I can be helpful.
(I also found FILESTAT inconsistencies in years after 2006; as you can see from table 2 in the technical report, the adjustment algorithm replicates slightly worse for these years. To investigate, I looked into those observations for which the algorithm produces diverging FILESTAT values in two random years (2006 and 2015). I found that if at least one of the joint filers is above 65, the original FILESTAT does not assign the same value to both spouses but always 1 to one of them. This seems inconsistent to me but maybe I am missing something? It would be great to clarify with CB.)
## Prepare 2004 data
df_2004 = select!(df_ASEC_2004,[:SERIAL, :RELATE, :AGE, :ADJGINC, :FILESTAT, :FILESTAT_adj]);
df_2004[!, :num] = 1:(size(df_2004,1));
hhs_2004 = unique(df_2004.SERIAL);
for k in hhs_2004
df_tmp = df_2004[df_2004.SERIAL .== k, :]
RELATE_vec = unique(df_tmp.RELATE)
if ~(201 in RELATE_vec)
continue # keep FILESTAT categories as they are
else
num_vec = unique(df_tmp.num)
age_101 = df_tmp[df_tmp.RELATE .== 101, :AGE][1]
age_201 = df_tmp[df_tmp.RELATE .== 201, :AGE][1]
if age_101 < 65 && age_201 < 65 # Both below 65
df_2004[num_vec[1], :FILESTAT_adj] = 1
df_2004[num_vec[2], :FILESTAT_adj] = 1
elseif age_101 >= 65 && age_201 >= 65 # Both 65+
df_2004[num_vec[1], :FILESTAT_adj] = 3
df_2004[num_vec[2], :FILESTAT_adj] = 3
else # One above, one below
df_2004[num_vec[1], :FILESTAT_adj] = 2
df_2004[num_vec[2], :FILESTAT_adj] = 2
end
# hhs with agi income == 0 do not need to file
adjginc_101 = df_tmp[df_tmp.RELATE .== 101, :ADJGINC][1]
adjginc_201 = df_tmp[df_tmp.RELATE .== 201, :ADJGINC][1]
if adjginc_101 == 0 && adjginc_201 == 0
df_2004[num_vec[1], :FILESTAT_adj] = 6
df_2004[num_vec[2], :FILESTAT_adj] = 6
end
# remaining hh members
if length(num_vec) > 2
for l = 3:length(num_vec)
df_2004[num_vec[l], :FILESTAT_adj] = df_2004[num_vec[l], :FILESTAT]
end
end
end
end