We are working on an analysis of inequality from 1970 to 2010 and the above mentioned top coded variables are skewing our “top 1%” income earners variable. Is there a way to get true values or as close to them as possible (especially from the 1970 and 1980 censuses) so that our research reflects income inequality more realistically? Thanks!

You need to request access to the Census Research Data Center. UMN obviously has one – see Minnesota.

Note that there is one survey that partially uses IRS data – the Survey of Consumer Finances, https://www.federalreserve.gov/econre…. They have a much better representation of top incomes… but it seems to me they do not get the standard errors right. (Getting standard errors for the Gini index in complex surveys is impossible, anyway; but I am talking about the basic standard errors on means and proportions and regression coefficients.)

Thank you! I was wondering if you mind expanding on the comment about the impossibility of calculating correct standard errors in Gini in survey data, I have used Stata’s svylorenz command. Should I be careful with these standard errors? I know this is off-topic and unrelated to IPUMS but thought I would ask anyway.

A few years back, I did my own research into estimation of Gini indices and their standard errors with complex survey data. (I wrote -epctile- and -gconc- back then.) First off, there are probably five or so different definitions of Gini index for i.i.d. data, and they all have different generalizations for weighted data. So it gets difficult right there. Second, I found that none of the four or so variance estimation methods I was comparing was totally bullet-proof. It is technically a difficult exercise: different twists of the complex sampling designs affect the different variance estimation methods differently. The paper has never been published.

Other than that, Stephen Jenkins is definitely one of the top methodologists as far as income inequality is concerned. In -svylorenz-, he implements *a* method for weighted Gini, and *a* method for standard errors that are both defensible.

Thank you for your response! It would be a great contribution to the community if your paper gets published, I hope it does! I can imagine how complex estimating variance for complex survey design data can be. I’m stuck trying to model heteroskedasticity in a GLM using ML to correct for inefficiency and have acceptable variance estimates. Thanks again for your help! I will talk with my advisor to see if it is worth sending a proposal for the non-public census data.