for the 1 and 3 yr samples for county and statefip variables, why are there so many "0" values for county?

Ive created a county fips variable using the county and statefip variables in order to merge the IPUMS dataset with another dataset. However, I see that there are a lot of zero values for the county variable, specifically for all of alaska as well as for certain counties in each state.

Is there documentation explaining these zero values?

the county variable description states that:

0000 = County not identifiable from public-use data (1950-onward only)

But I am wondering if you can provide more specific details about why it was not avaiable? Were the population in these counties not sampled?

The first thing to note is that COUNTY is not currently available for 2012 data files due to the change in PUMA codes and boundaries (detailed in the release note for the 2012 data files).

Counties are not actually made available in the Census Data after 1950, but IPUMS-USA used the smallest available geographic unit to construct counties. More details on this construction are available in the COUNTY variable description as well as a downloadable excel file listing the counties that were identifiable in each sample. Respondents who lived in the counties that are not identifiable (if component boundaries extended into other counties) were given a code of ‘0’. The lack of a county code does not necessarily imply that the county was not sampled, just that the component geographic units do not perfectly match up with the county boundaries.

I hope this helps.

Thank you for your response. Is there any way to identify the 0 counties? Even if the county boundaries do not perfectly match up, I would imagine that they still fall within a county boundary? I am also only interested in the 1 and 3 year samples for 2008-2011.

thank you.

You can attempt to identify counties yourself using the PUMA Maps that IPUMS-USA makes available. The maps include County boundaries so you can tell which PUMAs are a part of which counties. Because PUMAs are dependent on population (each PUMA must contain at least 100,000 residents) some PUMAs may encompass multiple counties, making it impossible to tell in which county a person lives.

I hope this helps.

thank you, does the geographic data also have geographic boundaries smaller than the city or county level, for example zip codes? I know that in the American Fact Finder website, there is an option to download variables at the zip code level, so do we have this option with thes site?

The smallest geographic unit available in IPUMS-USA is the PUMA. Because PUMAs are constructed based on population size as well as geographic features, PUMA sizes will be different based how densely populated on area is. This is meant to help protect the identities of individuals within the sample. You can read more about PUMAs on the Census Bureau’s Geography page.

So I would need to download the data set with the PUMA variable, then link it to the dataset with the county variable?

Would comparing the puma variable to the puma map allow me to identify additional counties that are not identified in the county variable dataset for 2008-2011 1 year and 3 years samples?

It is not necessary to download the PUMA variable and COUNTY variable in separate datasets. Both can be downloaded in a single dataset through IPUMS-USA. If you mean can you link a new extract with the PUMA variable to a separate dataset with county in order to geographically identify the entire country, then yes that is possible.

The variable COUNTY already gives all of the “Identifiable” counties (using IPUMA criteria), but if you are comfortable defining counties that do not match PUMA boundaries, then the PUMA maps will be helpful in this pursuit. Again, some PUMAs may actually contain multiple counties, so it will not be possible to singularly identify every single county even when relaxing the boundary requirement.

I hope this helps.

Thank you for your help. I am looking at the PUMA maps and I do see that there maybe a few that I can identify. For example, for the PUMAs that fall within a county:

Alabama: statefip=1 puma=200, falls within Madison county

Alabama: statefip=1 puma=1900, falls within Montgomery county

Is this correct?

Also, I am wondering why Shelby county in Alabama (statefip=1; puma =1300) didn’t get a county designation? The puma and the county boundaries appear to coincide exactly.


You are correct that PUMA=200 and PUMA=1900 in Alabama fall entirely within a single county. So it is safe to say that the people from PUMA=200 all live in Madison county, but they do not represent all of the people who live in Madison county. This is important to note if you are generating summary statistics at the county level.

PUMA=1300 in Alabama does seem to match boundaries with Shelby County, and Shelby County is on the list of Counties identifiable in 1950-onward data for 2000 and ACS samples. You should be safe assigning PUMA=1300 and STATEFIP=1 to COUNTY=1170.

I hope this helps.

Thank you, do the PUMA fall along census tract boundaries? If so, is there a way to identify the census tracts within each puma boundary?

I have not extensively looked into the correspondents between PUMA and census tract boundaries, so I can not say that they universally line up. However, based on a brief viewing of the TIGER/web Decennial map application for Census 2000 (since the 2000 Census PUMA boundaries were used for the 2005-2011 ACS samples) it seems that most PUMA boundaries do line up with census tract boundaries. So it would be possible to define PUMAs as groups of census tracts. You could use the resources available through TIGER/web to construct such a definition.

I hope this helps.