Why do roughly half the states have so few respondents in ACS 2016?


I extracted data from UPUMS USA looking at certain demographic information (race, income, age) and also pulled STATEICP and STATEFIP (and county and MET2013 and PUMA). I’d like to look at this info on a state by state basis, so sorted with STATEFIP codes.

When organizing by state, roughly half the states have less than 100 respondents, the rest have an (N) of dozens or hundreds of thousands. For instance, CA has 340k, and CT has 26. Why is this, and is there a way to make the data less …patchy? I have 5,264,018 rows of data.




Would you be able to share how you are tabulating the data? Are you further limiting cases based on certain criteria? Based on the total number of observations you have in your data, it sounds like you are working with a fully unzipped file, however, looking at your last extract, the minimum number of unweighted observations for a state should be Alaska with 8,797.

Feel free to share your code here or email us at ipums@umn.edu. If you have not narrowed your data in any way, I would recommend re-saving/unzipping your extract just in case something odd happened in the unzipping process.