I am exploring demographics by occupation and industry in Chicago and Illinois using both IPUMS USA and IPUMS CPS between 1990 and 2025, and am finding strange inconsistencies. Comparing the overall demographics of the city and state by birthplace (BPL), for example, using IPUMS CPS data, 85.93% of IL was born in the US in 2010, whereas using IPUMS USA that number is 88.44% in 2010. The stats seem even further apart looking at employment stats: looking at employed workers (using EMPSTAT) in Illinois in 2021, 78.70% are US born according to IPUMS CPS data, but 84.23% are US born according to IPUMS USA. Mexican-born employed workers in IL according to IPUMS CPS have risen from 6.7% in 2010 to 7.96% in 2023, but in IPUMS USA data, they have fallen from 4.87% in 2010 to 4.34% in 2023. Any sense of what might be happening here?
In general, we caution users against analyzing very small populations. Your analysis is restricted to a single metropolitan area, and then further by occupation, industry, birthplace, and employment status. Some estimates for the Chicago metropolitan area may be reliable using CPS data, but when the population of interest is very small, issues can occur. The sample sizes in some of your analyses are likely too small to produce representative estimates with an acceptable degree of certainty. The decennial census and ACS are better suited for most state- and sub-state-level analysis, especially if your population of interest is small (e.g., a specific occupation or people with a specific birthplace). Using ACS and census data, however, you still should pay close attention to the sample sizes and avoid interpreting estimates that have very thin cells, i.e., the estimate comes from a very small unweighted number of observations.
Small sample sizes aside, even when analyzing larger populations using CPS and IPUMS USA data, we would not generally expect these estimates to match perfectly for a number of reasons. The CPS, census, and ACS are different surveys with different sampling frames, survey designs (e.g., the CPS is a panel while the ACS is a single cross-sectional survey), collection method (e.g., the census and ACS are mandatory, while participation in the CPS is voluntary; CPS data are collected by a trained interviewer, while most households fill in the census and ACS questionnaires themselves), question wording, sample sizes, and editing procedures undertaken by the Census Bureau. You can read about the sampling design of IPUMS USA samples on this page; you can read about the CPS sample design, including differences between IPUMS CPS and IPUMS USA samples, on this page. This fact sheet from the Census Bureau provides a helpful overview of some key differences between the ACS and the CPS Annual Social and Economic Supplement (ASEC).
Note that in the CPS, the METFIPS code (metropolitan statistical area of residence) is directly reported by the Census Bureau. However, in IPUMS USA data from 1990 and later, neither city nor metropolitan area are directly reported in the original data. IPUMS geographers use other geographic information, like PUMA of residence, to identify cities and metro areas when possible. These methods result in some level of error in assigning respondents to geographic areas. This is another reason I would not expect metro area level estimates from IPUMS CPS to match those from IPUMS USA. You can read more about the identification of cities and metro areas in the CITY, MET2013, and METAREA variable descriptions.
For some of your analyses, you may prefer to use data from IPUMS NHGIS, which provides data from the decennial census, ACS, and other sources, aggregated at a variety of geographic levels, including metropolitan statistical area. These data are summary tables which report information about individual geographic areas, such as specific metro areas, like total population, employment status by sex, and so on. NHGIS does not provide any tables on occupation by birthplace at the metropolitan statistical area level, but you can find tables on population by date of birth, population by occupation, population by industry, and others that you may find useful. See this video tutorial on how to use the NHGIS data finder to search for and find the data tables you are looking for.
Thank you, Isabel, I appreciate this help and context for the datasets. But for the statistics I shared in my original question, I was using the state of Illinois. Even for the last comparison looking at the birthplace of just employed workers, the smallest sample size is for the year 2025 was still over 11,000 in the CPS dataset, and much greater using IPUMS USA.
I understand that these are different surveys, but the difference between 7.96% Mexican-born workers and 4.34% Mexican-born workers in IL in 2023 is pretty dramatic. Could this difference really just be different survey designs?
Thanks,
Tina
Without knowing more about your analysis, it’s hard to say with certainty what is causing the discrepancies you are seeing. The differences between the two surveys will certainly cause differences, and analytical decisions could cause additional differences. Below I will show how I would approach one of the analyses you mentioned in your first post using the online data analysis systems for IPUMS USA and IPUMS CPS.
What share of employed individuals in Illinois were born in the U.S. in 2021?
IPUMS CPS data: I used all BMS samples from 2021 (I did not use ASEC, since the reference period in the ASEC is the previous calendar year, while the reference period in the BMS more closely matches that in the ACS). I tabulated BPL and applied the following filters:
- year(2021)
- statefip(17)
- empstat(10-12)
This filter includes people who are employed (at work) and employed (has job but not at work last week). It does not include people in the armed forces. Members of the armed forces have zero weights in the CPS, and I am excluding them in both analyses for comparability. - age(16-85)
I filtered on age because the universe of EMPSTAT in the ACS is 16+, while the universe of EMPSTAT in recent CPS samples is 15+. This filter makes the universe the same for both datasets. - gqtype(0-8)
This filter excludes individuals in group quarters.
I selected WTFINL from the Weight dropdown menu.
Here is the input into the online analysis tool:
Here is a snippet of the output:
I see that 78.5 percent of employed individuals in Illinois were born in the U.S., according to the 2021 BMS samples.
IPUMS USA data: I used the 2021 ACS. I tabulated a recoded version of BPL (see the online analysis system instructions for how to recode variables) that classifies respondents as born in the U.S. or born somewhere else. I applied the following filters:
- year(2021)
- statefip(17)
- empstatd(10-12)
This filter includes people who are employed (at work) and employed (has job but not at work last week). It does not include people in the armed forces. Members of the armed forces have zero weights in the CPS, and I am excluding them in both analyses for comparability. - gq(1-2)
Just like above, this filter excludes individuals in group quarters.
I selected PERWT from the Weight dropdown menu.
Here is the input:
Here is the output:
I see that 80.8 percent of employed individuals in Illinois were born in the U.S. according to the 2021 ACS sample. This is very close to the estimate I obtained using the 2021 BMS data.
I estimated that 8.0 percent of employed individuals in Illinois were born in Mexico in 2023 using CPS data. Using 2023 ACS data, I estimated 6.4 percent. The unweighted sample size of Mexican-born workers is 1,221 in the CPS data and 2,692 in the ACS data. These are not tiny sample sizes, but they are small enough that I’m not surprised to see the difference of 1.6 percentage points, especially considering the differences between the two surveys.
I would suggest checking the weights, universe, reference period, and other details of the variables and your analyses to ensure you are producing estimates that are as similar as possible conceptually using the two surveys. The estimates will not match exactly, but they will likely be closer when you ensure that your analytical decisions take into account the differences between the surveys.
![]()
Isabel, thank you so much for this! I really appreciate your time walking me through this, it was very helpful.