Dear IPUMS,
I checked the proportion of observations whose incwage is topcoded in the 1940 full count data (below).
. fre incwage
incwage -- wage and salary income
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 0 | 5.03e+07 38.10 38.10 38.10
1 | 6290 0.00 0.00 38.11
2 | 7324 0.01 0.01 38.11
3 | 8242 0.01 0.01 38.12
4 | 6298 0.00 0.00 38.12
5 | 14525 0.01 0.01 38.13
6 | 14935 0.01 0.01 38.14
7 | 6322 0.00 0.00 38.15
8 | 17657 0.01 0.01 38.16
9 | 10371 0.01 0.01 38.17
10 | 43358 0.03 0.03 38.20
11 | 3661 0.00 0.00 38.21
12 | 30600 0.02 0.02 38.23
13 | 5170 0.00 0.00 38.23
14 | 10412 0.01 0.01 38.24
15 | 39489 0.03 0.03 38.27
16 | 15786 0.01 0.01 38.28
17 | 3848 0.00 0.00 38.29
18 | 23881 0.02 0.02 38.30
19 | 2756 0.00 0.00 38.31
: | : : : :
4982 | 9 0.00 0.00 68.18
4983 | 9 0.00 0.00 68.18
4984 | 14 0.00 0.00 68.18
4985 | 22 0.00 0.00 68.18
4986 | 8 0.00 0.00 68.18
4987 | 14 0.00 0.00 68.18
4988 | 21 0.00 0.00 68.18
4989 | 8 0.00 0.00 68.18
4990 | 64 0.00 0.00 68.18
4991 | 13 0.00 0.00 68.18
4992 | 65 0.00 0.00 68.18
4993 | 14 0.00 0.00 68.18
4994 | 7 0.00 0.00 68.18
4995 | 35 0.00 0.00 68.18
4996 | 26 0.00 0.00 68.18
4997 | 12 0.00 0.00 68.18
4998 | 18 0.00 0.00 68.18
4999 | 69 0.00 0.00 68.18
5000 | 393030 0.30 0.30 68.48
5001 | 4.16e+07 31.52 31.52 100.00
Total | 1.32e+08 100.00 100.00
-----------------------------------------------------------
Then I checked the distribution in 1940 1%. I assume the two should be similar if 1% is a random sample of the full count. But they differ a lot in terms of the share of individuals whose incwage is topcoded at 5001:
-> sample = 1940 1%
incwage -- wage and salary income
------------------------------------------------------------
| Freq. Percent Valid Cum.
---------------+--------------------------------------------
Valid 0 | 595645 44.07 44.07 44.07
1 | 109 0.01 0.01 44.07
2 | 143 0.01 0.01 44.08
3 | 75 0.01 0.01 44.09
4 | 84 0.01 0.01 44.10
5 | 117 0.01 0.01 44.10
6 | 133 0.01 0.01 44.11
7 | 68 0.01 0.01 44.12
8 | 148 0.01 0.01 44.13
9 | 274 0.02 0.02 44.15
10 | 505 0.04 0.04 44.19
11 | 49 0.00 0.00 44.19
12 | 412 0.03 0.03 44.22
13 | 60 0.00 0.00 44.23
14 | 121 0.01 0.01 44.24
15 | 460 0.03 0.03 44.27
16 | 182 0.01 0.01 44.28
17 | 37 0.00 0.00 44.29
18 | 276 0.02 0.02 44.31
19 | 38 0.00 0.00 44.31
: | : : : :
4923 | 1 0.00 0.00 74.76
4930 | 1 0.00 0.00 74.76
4935 | 1 0.00 0.00 74.76
4940 | 4 0.00 0.00 74.76
4943 | 1 0.00 0.00 74.76
4946 | 1 0.00 0.00 74.76
4948 | 1 0.00 0.00 74.76
4950 | 5 0.00 0.00 74.76
4952 | 1 0.00 0.00 74.76
4965 | 1 0.00 0.00 74.76
4976 | 1 0.00 0.00 74.76
4980 | 2 0.00 0.00 74.76
4990 | 1 0.00 0.00 74.76
4992 | 186 0.01 0.01 74.78
4993 | 1 0.00 0.00 74.78
4997 | 1 0.00 0.00 74.78
4999 | 1 0.00 0.00 74.78
5000 | 3867 0.29 0.29 75.06
5001 | 273 0.02 0.02 75.08
999999 | 336811 24.92 24.92 100.00
Total | 1351732 100.00 100.00
------------------------------------------------------------
Is it because those who are NIU for incwage in the full count data are also assigned the topcoded value of 5001? I checked again the distribution of incwage among people younger than 14:
. fre incwage if age<14
incwage -- wage and salary income
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 0 | 1396944 4.56 4.56 4.56
1 | 532 0.00 0.00 4.57
2 | 306 0.00 0.00 4.57
3 | 577 0.00 0.00 4.57
4 | 221 0.00 0.00 4.57
5 | 360 0.00 0.00 4.57
6 | 340 0.00 0.00 4.57
7 | 102 0.00 0.00 4.57
8 | 354 0.00 0.00 4.57
9 | 159 0.00 0.00 4.57
10 | 436 0.00 0.00 4.58
11 | 79 0.00 0.00 4.58
12 | 261 0.00 0.00 4.58
13 | 63 0.00 0.00 4.58
14 | 63 0.00 0.00 4.58
15 | 347 0.00 0.00 4.58
16 | 109 0.00 0.00 4.58
17 | 36 0.00 0.00 4.58
18 | 167 0.00 0.00 4.58
19 | 19 0.00 0.00 4.58
: | : : : :
4680 | 4 0.00 0.00 4.89
4700 | 4 0.00 0.00 4.89
4720 | 1 0.00 0.00 4.89
4731 | 1 0.00 0.00 4.89
4753 | 1 0.00 0.00 4.89
4800 | 31 0.00 0.00 4.89
4811 | 1 0.00 0.00 4.89
4820 | 1 0.00 0.00 4.89
4840 | 1 0.00 0.00 4.89
4847 | 1 0.00 0.00 4.89
4866 | 1 0.00 0.00 4.89
4868 | 1 0.00 0.00 4.89
4880 | 2 0.00 0.00 4.89
4900 | 2 0.00 0.00 4.89
4920 | 2 0.00 0.00 4.89
4928 | 1 0.00 0.00 4.89
4992 | 1 0.00 0.00 4.89
4998 | 1 0.00 0.00 4.89
5000 | 530 0.00 0.00 4.89
5001 | 2.91e+07 95.11 95.11 100.00
Total | 3.06e+07 100.00 100.00
-----------------------------------------------------------
It seems that my guess is only partly correct: most of these unqualified people have incwage of 5001. However, about 5% of them also have valid incwage values lower than 5001. Why do they have seemingly normal incwage values while they should be NIU?