why the number of households by income bin using individual-level data does not match the census publication?

yuanyvette1129 · April 18, 2017, 6:26pm

We have been unable to use the 1950 individual level data to match 1950 published statistics on the income distribution (Source: https://www.census.gov/prod/www/decennial.html, 1950 Census, page 1-104, table 57). (I also attached a screenshot of census publication).

Basicly, what we did is: keeping sample line households only, and weighting by the sample line weight should tell us the number of families and unrelated individuals.

Here are the codes we used to generate the number of families and unrelated individuals by income bins:

#delimit;

* keep only the -sample-line person-, for whom there is income data *;

keep if slpernum == pernum;

* drop n/a income;

drop if ftotinc==9999999 |ftotinc==9999998;

* create income bins for 1950, according to the census published bin categories *;

gen bin = .;

replace bin = 1 if ftotinc <500;

replace bin = 2 if ftotinc >=500 & ftotinc <=999;

replace bin = 3 if ftotinc >=1000 & ftotinc <=1499;

replace bin = 4 if ftotinc >=1500 & ftotinc <=1999;

replace bin = 5 if ftotinc >=2000 & ftotinc <=2499;

replace bin = 6 if ftotinc >=2500 & ftotinc <=2999;

replace bin = 7 if ftotinc >=3000 & ftotinc <=3499;

replace bin = 8 if ftotinc >=3500 & ftotinc <=3999;

replace bin = 9 if ftotinc >=4000 & ftotinc <=4499;

replace bin = 10 if ftotinc >=4500 & ftotinc <=4999;

replace bin = 11 if ftotinc >=5000 & ftotinc <=5999;

replace bin = 12 if ftotinc >=6000 & ftotinc <=6999;

replace bin = 13 if ftotinc >=7000 & ftotinc <=9999;

replace bin = 14 if ftotinc >=10000;

label define bin 1 “Less than $500”

2 “$500 to $999”

3 “$1,000 to $14,99”

4 “$1,500 to $1,999”

5 “$2,000 to $2,499”

6 “$2,500 to $2,999”

7 “$3,000 to $3,499”

8 “$3,500 to $3,999”

9 “$4,000 to $4,499”

10 “$4,500 to $4,999”

11 “$5,000 to $5,999”

12 “$6,000 to $6,999”

13 “$7,000 to $9,999”

14 “$10,000 or more”, replace;

label value bin bin;

* collapse count of families by income bin, with weights *;

collapse (count) ftotinc [iw=slwt], by (bin);

rename ftotinc num_family;

sum num_family bin, detail;

* calculate percentages *;

egen percent=pc(num_family);

format percent %9.1f;

format num_family %11.0gc;

* create table: number of people by income bin *;

table bin, c(sum num_family sum percent) format(%11.2gc) center row;

The result we got is that the total number of families and unrelated individuals (42,807,270) is smaller than the census publication (46,489,090), especially the bottom bins are quite off.

We are not sure whether ftotinc is defined for unrelated individuals, thus we tried to disagregate families and unrelated individuals. So we tried another approach: keeping sample line households only, and keeping only where relate = 1, and weighting by the sample line weight should tell us the number of families. Then we used sample line individuals where relate is 11 or 12 and weighting by the sample line weight, but none of them matched the census publication. We also used inctot for unrelated individuals, but still not matching.

We are writing in the hopes that you’ll have some insight into why this might be the case. Perhaps we’ve done something wrong in using the 1950 data? Or perhaps there is some other reason the 1950 individual data do not aggregate to the published statistics?

Thank you,

Yuan

grover · April 20, 2017, 2:22pm

The 1950 PUMS file was originally constructed by the Census Bureau and the FTOTINC variable is a direct representation of what the Census Bureau published. A detailed description of how the Census Bureau drew the 1950 1% sample can be found on the 1950 SAMPLING PROCEDURES page. Unfortunately, I am not able to say why discrepancies like the one you have pointed out exist. I would encourage you to reach out to the Census Bureau as they may have more specific information regarding how the 1950 PUMS file is expected to deviate from the summary tables.

Topic		Replies	Views
Sample Weights Approach Overtime for Decennial and CPS-ASEC	3	243	February 21, 2024
Income measures compared to ACS USA	3	1095	March 2, 2020
Obtaining Total Household counts for 1990 and 2000 USA	1	374	May 25, 2021
Matching Census Family Status Estimates CPS	2	258	May 20, 2021
Trouble Reproducing B19126	1	16	July 23, 2025

why the number of households by income bin using individual-level data does not match the census publication?

Related topics