I’ve been trying to do some summary statistics over income across different years.

However, I realized that there were several issues:

  1. for 1950, the net loss is indicated as “-1” regardless of the actual amount of loss for INCBUSFM, INCOTHER, and INCTOT. For instance, even if the INCWAGE was 550, because the magnitude of the net loss in INCBUSFM was greater than that of INCWAGE, INCTOT was denoted as -1, which raises an issue since we do not know the actual amount of loss.

  2. in some cases, for instance in 1950, INCTOT did not necessarily match the sum of INCWAGE, INCBUSFM, and INCOTHER, which is due to the fact that INCOTHER is the sum of the codes and not the actual amounts of INCWAGE, INCBUSFM, and INCOTHER, which raises an issue when we want to find out how much percentage of INCTOT do each of INCWAGE, INCBUSFM, and INCOTHER accounts for.

  3. because for each year, such as for 1950, there is top code such as $10,000, which means that even if INCWAGE, INCBUSFM, and INCOTHER exceeded $10,000 in practice, they can only be denoted as 10,000, which doesn’t reflect the true data.

These were the limitations that I’ve found while working with the data, and was wondering if you know if there is any alternatives/answers to solving these issues.

thanks for your help



These are difficult issues to work around. I’ll comment on each issue individually.

(1) For the 1950 census there really is nothing we can do, with only access to the public use data, to make up for the reality that any net loses are coded as “-1”.

(2) Regarding when INCTOT isn’t exactly the sum of INCWAGE, INCBUSFM, and INCOTHER note that this is the case for roughly 10% of the 1950 sample. So, you’ll be able to calculate shares for about 90% of the population and perform some sort of adjustment for the rest of the sample. One possible alternative is to replace values of INCTOT so that they equal the sum of INCWAGE, INCBUSFM, and INCOTHER in cases when this isn’t quite the case.

(3) There is really no way around top-codes. They are an unfortunate reality when using public use data.

I hope this helps.