Top-coded variables

jhd40 · July 26, 2018, 4:28am

With the top-coded salary variable in the Higher Ed dataset, when it is said to be top-coded is it 1. anomalised and marked as missing, is 2. the observation removed from the dataset or is it 3. rounded to the mark at which it is top-coded?

Thank you!

JeffBloem · July 26, 2018, 3:28pm

Your case number 3 is most accurate. The SALARY variable in IPUMS Higher Ed is top-coded at 150,000 US dollars (except for in the SESTAT-NSCG and NSRCG surveys where these values are top-coded at 100,000 US dollars). This means that for any SALARY value reported above the top-code, the value is replaced with the top-code value. So, for example, if someone reports a salary of 170,000 US dollars, then this value will show up as 150,000 in the data.

jhd40 · July 26, 2018, 11:49pm

Thanks for that I really appreciate it Jeff! One follow-up question, if you wouldn’t mind. As you mentioned the SETAT-NSCG and NSRCG is top-coded at $100,000. Examing the data in stata in for example, the year 1995 and 2013, it shows observations from these two survey actuallly reaching up to $150,000 in salary. Do you know why this is?

Thank you!

JeffBloem · July 27, 2018, 2:57pm

You are right about this! The documentation is misleading in this case. It should say that SALARY in SESTAT-NSRCG surveys are top-coded at 100,000 US dollars and all other surveys are top-coded at 150,000 US dollars. Sorry for the confusion here. We will update the documentation appropriately.

jhd40 · July 27, 2018, 3:18pm

Apologies, but even in SESTAT-NSRCG surveys, it seems observations are top-coded at $150,000? It seems all surveys include observations up to $150,000.

JeffBloem · July 27, 2018, 3:33pm

I don’t think so. After cleaning out observations with special codes for skips and missing, I find the following:

. by surid, sort : summarize salary

-> surid = NSCG

Variable | Obs Mean Std. Dev. Min Max
-------------±-------------------------------------------------------
salary | 457043 65154.34 36686.89 0 150000

-> surid = NSRCG

Variable | Obs Mean Std. Dev. Min Max
-------------±-------------------------------------------------------
salary | 90087 36660.75 20529.4 0 100000

Perhaps I am missing something?

jhd40 · July 27, 2018, 3:51pm

Apologies, you’re right, sorry!

Topic		Replies	Views
For the Higher Ed dataset of all the observations the salary is at most $150,000. This seems low, is it capped?	1	435	July 20, 2018
The highest one percent incwage earners are combined into the incwage"Top Code,"which is the median of that group? USA	17	1275	August 25, 2021
Top-code change in ASEC CPS	1	411	November 11, 2019
Topcoding of earnings in ATUS TIME USE	4	84	November 25, 2025
Top coding logic USA	1	68	January 23, 2025

Top-coded variables

. by surid, sort : summarize salary

Related topics