We have been unable to use the 1950 individual level data to match 1950 published statistics on the income distribution (Source: https://www.census.gov/prod/www/decennial.html, 1950 Census, page 1-104, table 57). (I also attached a screenshot of census publication).
Basicly, what we did is: keeping sample line households only, and weighting by the sample line weight should tell us the number of families and unrelated individuals.
Here are the codes we used to generate the number of families and unrelated individuals by income bins:
#delimit;
* keep only the -sample-line person-, for whom there is income data *;
keep if slpernum == pernum;
* drop n/a income;
drop if ftotinc==9999999 |ftotinc==9999998;
* create income bins for 1950, according to the census published bin categories *;
gen bin = .;
replace bin = 1 if ftotinc <500;
replace bin = 2 if ftotinc >=500 & ftotinc <=999;
replace bin = 3 if ftotinc >=1000 & ftotinc <=1499;
replace bin = 4 if ftotinc >=1500 & ftotinc <=1999;
replace bin = 5 if ftotinc >=2000 & ftotinc <=2499;
replace bin = 6 if ftotinc >=2500 & ftotinc <=2999;
replace bin = 7 if ftotinc >=3000 & ftotinc <=3499;
replace bin = 8 if ftotinc >=3500 & ftotinc <=3999;
replace bin = 9 if ftotinc >=4000 & ftotinc <=4499;
replace bin = 10 if ftotinc >=4500 & ftotinc <=4999;
replace bin = 11 if ftotinc >=5000 & ftotinc <=5999;
replace bin = 12 if ftotinc >=6000 & ftotinc <=6999;
replace bin = 13 if ftotinc >=7000 & ftotinc <=9999;
replace bin = 14 if ftotinc >=10000;
label define bin 1 “Less than $500”
2 “$500 to $999”
3 “$1,000 to $14,99”
4 “$1,500 to $1,999”
5 “$2,000 to $2,499”
6 “$2,500 to $2,999”
7 “$3,000 to $3,499”
8 “$3,500 to $3,999”
9 “$4,000 to $4,499”
10 “$4,500 to $4,999”
11 “$5,000 to $5,999”
12 “$6,000 to $6,999”
13 “$7,000 to $9,999”
14 “$10,000 or more”, replace;
label value bin bin;
* collapse count of families by income bin, with weights *;
collapse (count) ftotinc [iw=slwt], by (bin);
rename ftotinc num_family;
sum num_family bin, detail;
* calculate percentages *;
egen percent=pc(num_family);
format percent %9.1f;
format num_family %11.0gc;
* create table: number of people by income bin *;
table bin, c(sum num_family sum percent) format(%11.2gc) center row;
The result we got is that the total number of families and unrelated individuals (42,807,270) is smaller than the census publication (46,489,090), especially the bottom bins are quite off.
We are not sure whether ftotinc is defined for unrelated individuals, thus we tried to disagregate families and unrelated individuals. So we tried another approach: keeping sample line households only, and keeping only where relate = 1, and weighting by the sample line weight should tell us the number of families. Then we used sample line individuals where relate is 11 or 12 and weighting by the sample line weight, but none of them matched the census publication. We also used inctot for unrelated individuals, but still not matching.
We are writing in the hopes that you’ll have some insight into why this might be the case. Perhaps we’ve done something wrong in using the 1950 data? Or perhaps there is some other reason the 1950 individual data do not aggregate to the published statistics?
Thank you,
Yuan