Thanks for the rely, Brandon, but I need to make sure I understand your answer.
You said that “FTOTINC and HHINCOME were topcoded by state and topcoded values were replaced with state medians”. Suppose that the state median for HHINCOME was $50k, that the threshold for the topcode was $100k, and that HH #1 responded with a value of $70k and HH #2 responded with a value of $200k. This means that HHINCOME for households 1 and 2 would be coded as $70k and $50k, respectively. That is, the ordering of these two households by household income would be reversed in the database, compared with their true ordering. Do I have this right?
If so, it makes it more difficult to produce a meaninful histogram of HHINCOME. I suppose one would have to search records for the given median value, assume that all HHs displaying the median value have in fact been topcoded, and then treat this subset of HHs as if their HHINCOME was larger than the largest value recorded. That is, the largest value recorded (for a state) would provide a (low) estimate of the actual topcode threshold. It also means that any HH that also happens to actually match the median income would wrongly be considered as a topcoded HH. Is all of this true? Or is there a better way to produce a meaningful histogram of HHINCOME?