I’ve been trying to do some summary statistics over income across different years.

However, I realized that there were several issues:

  1. for 1950, the net loss is indicated as “-1” regardless of the actual amount of loss for INCBUSFM, INCOTHER, and INCTOT. For instance, even if the INCWAGE was 550, because the magnitude of the net loss in INCBUSFM was greater than that of INCWAGE, INCTOT was denoted as -1, which raises an issue since we do not know the actual amount of loss.

  2. in some cases, for instance in 1950, INCTOT did not necessarily match the sum of INCWAGE, INCBUSFM, and INCOTHER, which is due to the fact that INCOTHER is the sum of the codes and not the actual amounts of INCWAGE, INCBUSFM, and INCOTHER, which raises an issue when we want to find out how much percentage of INCTOT do each of INCWAGE, INCBUSFM, and INCOTHER accounts for.

  3. because for each year, such as for 1950, there is top code such as $10,000, which means that even if INCWAGE, INCBUSFM, and INCOTHER exceeded $10,000 in practice, they can only be denoted as 10,000, which doesn’t reflect the true data.

These were the limitations that I’ve found while working with the data, and was wondering if you know if there is any alternatives/answers to solving these issues.

thanks for your help

These are difficult issues to work around. I’ll comment on each issue individually.

(1) For the 1950 census there really is nothing we can do, with only access to the public use data, to make up for the reality that any net loses are coded as “-1”.

(2) Regarding when INCTOT isn’t exactly the sum of INCWAGE, INCBUSFM, and INCOTHER note that this is the case for roughly 10% of the 1950 sample. So, you’ll be able to calculate shares for about 90% of the population and perform some sort of adjustment for the rest of the sample. One possible alternative is to replace values of INCTOT so that they equal the sum of INCWAGE, INCBUSFM, and INCOTHER in cases when this isn’t quite the case.

(3) There is really no way around top-codes. They are an unfortunate reality when using public use data.

I hope this helps.

Hi @JeffBloem ,

I have a follow-up question. I am using ACS 5-year data 2005-09 and the variable INCTOT. I want to know which person is top-coded on INCTOT. However, this website ( indicates that the top-coded value for ACS data is “-” , which I do not quite understand what “-” means. My guess is that maybe INCTOT in ACS is the sum of individual income items such as INCWAGE, INCBUS, … , which are themselves top-coded to 99999. However, when I tabulate INCTOT or INCWAGE, I see no observation with the value of 99999 on either variable. This is the case even if I limit my sample to the last year of the 5-year sample, i.e. 2009. And the values of INCTOT or INCWAGE seem to be quite continuous until they abruptly jump to 9999999 which denotes N/A. It seems to me to mean that these two variables do not seem to be top-coded, which should not be true.


Sorry, I have an additional question related to top-coding in ACS and Census. As mentioned on the website, “Note that INCTOT is the sum of components that are themselves already Top coded.”.

Say John’s INCWAGE is 120000 in reality but is topcoded at 99999 in the public data, and his INCBUS is -2 (real value, not bottom-coded), and he has 0 value for all other income sources. In this case, will his value on INCTOT in the public data be shown as 99997, or will it be shown as being top-coded ( at 99999 in this example)?

Thank you so much!

In the ACS samples, INCTOT is the sum of INCWAGE, INCBUS00, INCSS, INCWELFR, INCSUPP, INCINVST, INCRETIR, and INCOTHER. Most of these variables contain their own topcode values in ACS samples, which represent the 99.5th percentile of the income distribution within a given state. Therefore, the topcode on each of these income component variables are not all equal to a specific special code, but rather a censored income value. Additionally, all values of INCTOT==9999999 indicate that the observation is not in the universe, which in this case means the individual is below the age of 15.

To answer your specific question: If John’s INCWAGE is 120,000 then this is John’s reported wage with one exception. If 120,000 is equal to the 99.5% percentile on the income distribution within John’s state of residence, then it could be the case that John’s actual wage is larger than 120,000.

INCTOT is simply the sum of all of the components, including any topcoded figures. You can verify this with the data by summing up all of the relevant income component variables within a given sample. There are only two types of cases where the sum of the components does not equal INCTOT. First is when INCTOT==999999 or when INCWAGE==999999 (e.g., when an observation is NIU). Second is when the sum of the components equals exactly zero. In this case INCTOT==1 as is noted on the codes tab.

Thank you so much! Super helpful.

Hi JeffBloem,

I just want to double check with you about a minor issue. On this webpage ( it says “The Census Bureau rounds its top-coded values before releasing the public-use data. For example, wage and salary income is rounded to the nearest $1,000”. Does this rule apply to the 1990 5% data?

For instance, only 93 cases would be identified as being top-coded on INCWAGE in 1990 5% data if I round all the state-specific top-coding values of INCWAGE to its nearest 1000. If I do not round them, then the number of cases thus identified would be 35,527.
This only happens when using the 1990 data.
Thank you for your attention.

As is noted on the codes tab for the INCWAGE variable, “… for Census Year 1990, any observed value greater than the Top Code value of $140,000 was coded as the median value greater than $140,000 within that observation’s state.” These state-specific medians are reported in the table on the page referenced in your question. These figures are not rounded to the nearest $1,000.

Thank you!