Will there be some people in one county be identified at county level while others not?


I am currently doing research that requires to identify at least which county the individual lives in. But I found that a lot of counties have no observation (IPUMS-CPS). Here are my questions:

  1. I know there are some counties not in the CPS sample and some counties in the sample but not identifiable due to confidential reasons. Are all counties listed in this page in the sample?


If the county codes in this page have no observation, may I assume that residents in this county are assigned county code 0 for confidential reasons?

  1. Within the metropolitan area, I found that the portion of observations from counties in this Metropolitan Statistical Area (MSA) is not close to the population portion of this county in this MSA. I downloaded basic monthly data from 2004 May to 2017 December. For example:

The metfips 37980 (Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metropolitan Statistical Area) contains county codes 34005,34007,34015,42017,42029,42045,42091,42101,10003,24015,34033 by definition of this MSA. And from ACS one year population estimates, I know that the population of county 10003 (New Castle County) is approximately 9% of the population in this MSA. But the observation with county code 10003 is 38.61% of the observations with metfips code 37980.

I am confused by this inconsistency.

I want to know does this mean actually not every household in the MSA has equal possibility of being selected?

Is the assumption that the number of observations in each county within the MSA is proportional to the population of this county wrong?

  1. Due to so many examples in the question 2, I want to ask if it is possible that some individuals in one county (for example 01073) are assigned county code 01073 while others are assigned county code 0?

  2. In some MSAs, the county codes are all zero for observations with this metfips code. For example the metfips 25540 has no observation that can be identified at county level.


I want to know if the observations are proportional to the county population. Is there any way I can identify the residence at the county or even city level for these observations?

  1. I noticed there are some counties that can be identified during some time period but not after a specific time. How does this happen? Is there a so large population decrease that makes a county not identifiable?

Thank you so much!



I’ll address each question individually.

(1) The county codes displayed on the County Codes: May, 2004-July 2005 page list the codes available in samples between May 2004 and July 2005. Note that not all counties listed here will be available in all samples within this range. This list is designed as a reference for identifying the county name that is associated with the COUNTY code. So, yes, if a county has no identified observations then the individuals who actually live in this county will have a county code of 0 == not identified.

(2) It looks the issue here relates to the use of sampling weights. The CPS uses a complex sampling methodology and in order to construct representative population estimates, we need to use sampling weights to correct for the sampling procedure. Based on my calculations, using the basic monthly samples from May 2004 through December 2017 and the household sampling weight (HWTFINL), I find that approximately 9 percent of the population in METFIPS==37980 is in COUNTY==10003. More details about sampling weights in general are available here.

(3) This should not be the case in CPS data. Since counties are identified directly by the Census Bureau, all individuals who live in an identifiable county should be coded as such.

(4) Unfortunately, in cases where there are no identifiable counties within a given metropolitan area, the metropolitan area is the lowest geographic level identifiable in the data. As you note above, this is due to confidentiality restrictions placed on public use microdata.

(5) Decisions about which counties are identifiable when are made by the Census Bureau and are largely determined by issues relating to confidentiality. Changes in which counties are available when can be driven by factors such as population growth, changing geographic boundaries, other associated geographic identification, etc.



Hi Jeff:

Thank you so much for your answer! All my questions are solved now.