IPUMS redesign research project

What is the status of the redesign project funded by the National Institute of Health (HD 043392-03S1). I believe its objective is to provide a way to estimate variances in use of IPUMS data. Am I correct, and does that research still hold promise for the estimation of variances?

Hi Chip,

IPUMS has implemented all three of the goals of this project as outlined in the redesign summary. You are correct that the objective of the project was to provide researchers with the tools to derive empirically accurate variances for estimates using IPUMS data. These tools are available in different forms across IPUMS projects from strata and cluster variables to replicate weights and sample code. If you are interested in calculating variance for estimates for a specific IPUMS project, please feel free to share more details about your project and I will explain the variance estimation process recommended by IPUMS using these tools.

Thanks Ivan. Below is the description of my project. Hope this is what you are looking for.

The variance I would like to calculate relates to the unserved market for federal rental housing tax credits. It is for the percentage of income qualified households for the tax credit program that are paying above the tax credit maximum rent for the apartment they are occupied.

The data used for a given program year and geography:

  • Maximum qualifying income level based on household size (from HUD)
  • The maximum rent level for a given apartment size, in number of bedrooms (from HUD)
  • For each household: the annual household income, number of people in the household, their apartment size in terms of numbers of bedrooms, and gross rent paid for the apartment (from IPUMS)

My process:

  1. Determine whether a household is income qualified (this is done through an if-statement iteration based on household size)
  2. Determine whether the income qualified household is paying above the tax credit maximum allowed rent for the apartment size they occupy (this is done through an if-statement iteration based on the number of bedrooms occupied by the household)
  3. Sum the number of income qualified households paying above the allowed program rent and divide that by a sum of the number of income qualified households.

In order to generate empirically derived standard error estimates with IPUMS ACS samples, you will need to use IPUMS provided replicate weights in a statistical software package such as Stata, R, SPSS, or SAS. Below is sample code you might run in Stata (the income and rent thresholds I used were arbitrary fill-ins for this example). Replicate weights are explained in further detail on this page, which also provides code for R and SAS. This will generate point estimates for your ratios, standard errors, and 95% confidence intervals. You will also need to add the variables REPWT, RENT, HHINCOME, NUMPREC, and BEDROOMS to your extract.


`svyset[pweight=hhwt], vce(brr) brrweight(repwt1-repwt80) fay(.5)mse'
*This line comes directly from the replicate weight user guide linked above. The only changes are that 
*household weights (HHWT) replace person weights and household replicate weights (REPWT) replace person 
*replicate weights since your outcome is on the household-level.

gen income_qualified = 0
replace income_qualified = 1 if rent != 0 & (hhincome < 30000 & numprec == 1 | hhincome < 50000 & numprec > 1)
*Use any criteria to determine whether a household is income qualified. The rent !=0 condition ensures that 
*your analysis only includes rented housing units.

gen above_tax_max = 0
replace above_tax_max = 1 if income_qualified == 1 & (rent > 700 & bedrooms ==1 | rent > 1200 & bedrooms > 1)
*Use any criteria to determine whether the income qualified household is paying above the tax credit maximum 
*allowed rent for the apartment size they occupy.

svy, subpop(if income_qualified == 1 ): tab above_tax_max, se ci
*Use the subpop option to restrict your analysis to only income qualified households without losing sample 
*design information. 

Ivan, thanks for this detail. I need some more help, however. My database activities have been pretty much limited to Excel and Power BI. Of the packages you mentioned I have used R, though recently only for data conversions. I’ll dig into R for getting at variances and standard errors, but when I follow your link I just see the following code for R once the srvyr package is accessed. I know I would change PERWT to HHWT. Is there some place to go to get the other code that you provided for Stata?

svy ← as_survey(data, weight = PERWT , repweights = matches(“REPWTP[0-9]+”), type = “JK1”, scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)

I wrote up this code based on the description of the project you provided; I have not written up similar code in R. You will have to write the code for this analysis and it will need to be modified according to your specific needs. Our team answers questions relating to documentation, but we do not provide code review. You might however find this forum post and the accompanying code helpful. You will need to use the ipumsr package to input and analyze IPUMS data. A guide on using the package, with additional vignettes, is provided on this documentation page.

Thanks for these follow up steps, Ivan. fyi, as an experiment (for me) I requested that Bing AI convert your Stata code to R. Below is the response.

library(survey)
# create a survey design object with BRR weights
design <- svydesign(id=~1, weights=~hhwt, fpc=~repwt1, data=mydata)
# generate income_qualified variable
design$income_qualified <- 0
design$income_qualified[design$rent != 0 & (design$hhincome < 30000 & design$numprec == 1 | design$hhincome < 50000 & design$numprec > 1)] <- 1
# generate above_tax_max variable
design$above_tax_max <- 0
design$above_tax_max[design$income_qualified == 1 & (design$rent > 700 & design$bedrooms == 1 | design$rent > 1200 & design$bedrooms > 1)] <- 1
# subset the design object by income_qualified
sub_design <- subset(design, income_qualified == 1)
# tabulate above_tax_max with standard errors and confidence intervals
svytable(~above_tax_max, sub_design, SE=TRUE, CI=TRUE)