Median Family Income

Hello IPUMS,

I’m trying to replicate estimates from ACS Table B19126 for CA using the 2014-2018 sample, specifically looking at median family income for male and female householders with no spouse present with own children under 18 years of age. My estimates are outside the table’s margins of error.

I think I have coded the “male householder, no wife present, with own children under 18 years of age” and the “female householder, no wife present, with own children under 18 years of age” correctly. My estimates are very close to the estimates provided in Table B11003.

I suspect that my treatment of the income variable is incorrect, and I was hoping you could steer me in the right direction. I’ve included my Stata code below, but I’ll summarize here, too. 1) Use the Census Bureau variables to construct family units within the household. 2) Identify never-married own children under the age of 18 by family and create a count by family. 3) Create a total family income variable using inctot. (I first zero out the NAs. I only add income of related individuals.) 4) Adjust for inflation using adjust variable. 5) Calculate median family income using epctile and household weights.

I’m getting 45,378 for men and 29,596 for women. The ACS Table reports 46,368 ±667 for men and 30,677 ±257 for women.

Any suggestions or advice would be greatly appreciated.



/*This code replicates estimates in ACS Table B19126, but first we want to make
sure we have the correct population and weights. I use B11003. We can’t use
the IPUMS created variables to replicate family estimates.

First, i make a serial # specific to families in the household using the CB’s original serials. */

egen serialfam = concat(cbserial cbsubfam), format("%15.0g")

/Next I tag never-married children in the family under the age of 18. “Own child”
is defined as biological, step, or adopted.

gen ownchild = 0
replace ownchild = 1 if (related==301 | related==302 | related==303) & age<18 & marst==6

/Then get a count of how many own children under the age of 18 are in the family.
This only works for primary families, because relate is a variable that shows the
relationship to the head of household.

by serialfam, sort: egen n_ownchild = sum(ownchild)
sort serialfam pernum
list serialfam cbsubfam pernum relate age sex ownchild n_ownchild in 1/50 //it works!

*Now take a look at the estimates in B11003. They are very close.

total personMIL if n_ownchild>0 & relate==1 & marst!=1 [iw=hhwt], over(sex)

*Create a family income variables. First zero out the Not Applicable code. Mark as missing because the CB includes families with zero income when calculating median family income.

replace inctot=. if inctot==9999999
by serialfam, sort: egen family_inc = sum(inctot) if relate<=10
sort serialfam pernum
list serialfam pernum relate age sex inctot family_inc in 1/50

*We have to adjust income variables for inflation.

gen family_incADJ = family_inc*adjust

epctile family_incADJ if n_ownchild>0 & relate==1 & marst!=1 & family_incADJ!=. [iw=hhwt], percentiles(50) over(sex)

We generally do not expect to exactly replicate official statistics with public use microdata for the ACS. This is because the public use microdata uses a slightly different sample than what is used to generate “official” statistics. You can read more about this detail on this page.

Although the estimates from the microdata are usually very close to the official estimates, they do not always fall within the margin of error of the official estimates. We shouldn’t expect them to, either. This is because the estimates from microdata have their own margin of error, which will be somewhat larger than the one for the official estimates. If you did a t-test for the difference between these two estimates, the difference may not be statistically significant, although the difference you’re finding does look big enough to suspect other things are going on.

Keeping these in mind, I also have a couple of comments on your methods. First, I don’t believe the tables on use the concept of subfamily. Instead they use the concept of family captured in the IPUMS variable FAMUNIT. Try basing your calculations on this family definition (identified by SAMPLE, SERIAL, FAMUNIT) and see if you get closer to the official estimate. This shouldn’t be necessary if you use the IPUMS variable FTOTINC, though, since this will automatically apply this definition of the family when calculating total income for the householder’s family.

Hi Matthew.

Thanks so much for your thoughtful and helpful message. You were right. I changed my command to the following:

egen serialfam = concat(sample serial famunit), format("%15.0g")

With no other changes to this code, this produces estimates within the margin of error.

Again, thanks so much. I really appreciate it.