I’m attempting to estimate broadband access in a particular state. In my dataset I thought I had already excluded group quarters (by restricting gq to 1 or 2), but when I run the frequencies I find a substantial # that are still labeled NA(GQ). Is this just missing data? Or am I making a mistake with my group quarters subsetting? Thanks!
As the codes tab of the GQ variable notes, GQ==0 identifies vacant units. These households should not show up in a person-level data file, however, because there are no people in those households. After looking into the ACS data from 2013 through 2018 I am unable to replicate your observation. Therefore, I wonder if you are mistakenly interpreting CIHISPEED==0 “NA(GQ)” as GQ==0.
Thanks for getting back to me! Since I’m new to working with the microdata, it’s certainly possible that I’m confused about something.
I am looking at the 2018 1-year Michigan file. Since I am interested in households, I dropped all except pernum=1, and then dropped gq>2 and gq=0. A crosstab of gq & cihispeed (with household weights) shows the following:
households under 1970 def: na(gq) 603,179; yes 2,639,244, no 711,594
additional households: na(gq) 183; yes 2842; no 289
It’s the na(gq) that is puzzling me. Since I dropped all but gq=1 and gq=2 (which as I understand it are “regular” households rather than group quarters), why am I getting 600K na(gq)?
Ah, okay. So, the “NA(GQ)” label is probably misleading in this case. Being in a GQ is only one reason for a CIHISPEED==0. The other could be that the household does not have access to the internet. You can see this if you look at the questionnaire, where question 9 (on internet access) screens for question 10 (high speed internet access).
Oooh I see. I hadn’t put that together. That makes more sense. Much obliged.