Difference in Sample Size between ACS 2016 and Census Technical Documentation For ACS PUMS

I’m working with the 2016 ACS, but when I was looking through the technical documentation available on the census website I noticed that the sample size I had was different than what census said the PUMS included. I thought it might be because the sample sizes provided by the census also included the sample from Puerto Rico, but it also carried over when I look at specific states. For example, the census document says there should be 24,100 housing units from Minnesota in the sample, but I only have 21,531 between values of 1 and 2 for the variable “GQ”. Similarly, the census says there should be 140,606 sample units for California, but I only have 132,809. These aren’t huge differences, but I was wondering if I doing something wrong or if there are just slight differences in the PUMS sample described by the census and the one available here.

Note: I calculated the state housing unit numbers above by looking at the number of observation in which the value for PERNUM was 1 and the value for GQ was 2 or 3 for the appropriate STATEFIP values.

The difference between the numbers you are seeing and the numbers in the technical documentation represent the number of vacant housing units in the sample. Because you are using rectangularized data (where household information is attached to each person record within the household) you do not have any of these vacant household in your data (since a vacant household has no person records to merge onto). If you create an hierarchical extract the variable VACANCY can be used to identify these vacant units. I double checked and in the 2016 1-year ACS PUMS file there are 2,569 vacant households in Minnesota, adding that to the 21,531 occupied housing units you observed equals the total 24,100 housing units figure published by the Census Bureau.

I hope this helps!