Dear IPUMS staff,
I’m sorry this may be a long question. I’m looking at the health insurance unit “HIU”, which seperates people live in the same house to smaller families based on the family relationship. HIU is usually used as a more accurate way to define “family size” and thus the eligibility of government subsidised insurance program. So I’m looking at the distribution of some household characteristics by family income, and I find there are some weired outliers for a specific income group and I’m curious why is that?
I’m using ACS 14-16 data. Here is the Stata code I used to check the distribution, which I think you can run it directly:
// I first define the poverty level for each family based on the family size //
gen FPL = hiufpgbase if hiunpers == 1
replace FPL = hiufpgbase + hiufpginc * (hiunpers - 1) if hiunpers != 1
// calcuate family income by summing up all individual’s income //
replace inctot = . if inctot == 9999999
bysort year hiuid: egen HIU_inc = total(inctot), missing
drop if HIU_inc < 0
// calculate the family income relative to FPL //
gen ratio = HIU_inc / FPL ** so ratio = 1 means family income is at 100% FPL **
// Limit the sample to people age 27-64 & family income below 200% FPL //
keep if age <= 64 & age >= 27
keep if ratio < 2
/* draw the graph to see the distribution, I use command “cmogram”, which group the people to equal size income bins and plot the average outcome of each bin. I checked some household characterstics such as age, sex, employment status */
gen male = (sex == 1)
gen employed = (empstat == 1)
cmogram age ratio, cut (1) histopts(bin(40)) scatter line(1) qfitci
cmogram male ratio, cut (1) histopts(bin(40)) scatter line(1) qfitci
cmogram employed ratio, cut (1) histopts(bin(40)) scatter line(1) qfitci
It seems that there is something wrong for people with family income around 70%-75% FPL. The age is much older, and percentage of being employed is much smaller. but being male looks fine. I also check many other outcomes, for example, marriage status, have college degree, etc, and find similar phenomenon. Do you have any idea what happen to this group of people? Since the sample size is very big, I don’t think there should be these outliers.
Thanks a lot!