How frequently and why would a countyfip code be missing. In a download of a custom dataset I get the statefip, the PUMA # but for many of the survey/serial numbers the countyfip is missing. For instance in my 2021 sample, for serial number 714094, the state is 27 and the PUMA is 1600, but there is a 0 in the countyfip field – while for most of the serial records the countyfip is included. How do people handle this situation?
I think you’re running into the fact that geographic areas with less than 100,000 people are not identified in the samples. PUMAs are the smallest geographies you can get all of the microdata for.
Thanks for the response. Would it not show the county ID if a record does show the PUMA #? Some of these missing county IDs are for counties that did have populations over 100,000 in 2010 in case that matters.
We only identify COUNTYFIP if the following criteria are met (from our COUNTYFIP variable description):
it was coterminous with a single SEA, county group, or PUMA; or
it contained multiple SEAs, county groups, or PUMAs, none of which extended into other counties.
PUMAs have a minimum population threshold of 100,000 people, and sometimes, a county with a population over 100,000 must be merged with an adjacent county if the adjacent county has less than 100k population and can’t be merged with another county to meet that threshold.
In your example, PUMA 1600 does nest within part of Scott County, Minnesota; but PUMA 1700 contains the remaining part of Scott County AND Carver County. Since PUMA 1600 isn’t coterminous with Scott County’s boundary and since 1700 extends beyond Scott County, we do not identify Scott County in the microdata.
Users are welcome to assign their own county codes to the microdata using a different methodology than the one we use. We provide links to the PUMA composition files here.
Dave Van Riper
IPUMS Research Scientist
Thanks Dave… I was thinking that what you wrote might be the situation I was facing. Does IPUMS have documentation of the approaches it uses (like what you described) that I can access?
I don’t have specific documentation related to our methodology, but I can give you some basic pointers on how we did our county identification.
- Use IPUMS NHGIS to get the following data file:
** Dataset: 2010_SF1a
** Geog level: Census block
** Data table: P1. Total Population
- This data file will have geographic identifiers for state, county and 2010 PUMA
- Summarize this dataset by State, County, and PUMA, creating a total population count for each unique combination of these three geographic identifiers
- If a particular PUMA has one and only record in the summarized dataset, then you know that PUMA nests within a county
** This is how you could assign a county code to Minnesota 1600
- We then processed the summarized dataset to determine which counties are identified
I think this will start you in a good direction.
Dave, thanks for the description of your methodology. That and the prior email now give me a good understanding of the county field in IPUMS tables.