Understanding Migration Data


I apologize in advance for the long question. I am conducting an analysis where I would like to look at some of the characteristics of people moving into Guadalupe county TX versus those migrating out of Guadalupe county TX. I have identified the PUMA as 05700 and fortunately the MIGPUMA shares the same boundary with the same value 05700.

Hopefully this isn’t too convoluted a question but it regards the interpretation of the data using for instance the ACS1 versus the ACS5.

For example:
Using the tidycensus package in R, if I pull the entire 2021 1-year ACS Public Use Microdata Sample and then filter the data to be all records where the MIGPUMA is equal to 05700 and the MIGSP is equal to Texas but the PUMA is not equal to 05700 in Texas, my understanding is that this would represent all records of people who have migrated out of Guadalupe County in 2021 assuming I total based on the person weight?

My follow up is if I were to apply this same set of operations using the 2021 5-year data, is the interpretation that over the 5 year period the resulting data (e.g., income, age, educational attainment) are the characteristics on average for people leaving the county?

If I may indulge one final question, I conducted this analysis and out of curiosity compared it to the Population Estimates Program County-level migration data and found that for instance, the domestic migration statistic for 2021 does not compare to the net domestic migration statistic for the 2021 ACS 1-year following the logic above. I assume this is because different estimation methods are used such as the IRS data with the population estimates program whereas the PUMS is a survey?

Any clarification would be greatly appreciated!

You should use the variable MIGPUMA1 (PUMA of residence 1 year ago); MIGPUMA (PUMA of residence 5 years ago) is only available for 1990 and 2000 samples. You should also combine it with MIGPLAC1, which reports the state that a respondent who had moved in the previous year moved from. You are correct that respondents with MIGPUMA1 = 05700, MIGPLAC1 = 048, AND (PUMA != 5700 OR STATEFIP != 048) are those who lived in Guadalupe county, Texas one year before the survey, but no longer did so at the time of the survey. A different way you might define these respondents are those with MIGPUMA1 = 05700, MIGPLAC1 = 048, AND MIGRATE1D != 23. By excluding respondents who moved within the MIGPUMA, you are only left with respondents who moved homes by leaving Guadalupe county/MIGPUMA. The ACS is fielded throughout the year, so the respondents may have migrated out of the county/MIGPUMA in 2020 or in 2021. You may be also interested in the recent release of the 2022 ACS 1-year file where Guadalupe county is contiguous with PUMA and MIGPUMA 05700

The 2021 5-year data append each of the five 1-year data files (2017-2021) into a single file. Your estimates using this file will therefore include all respondents who reported migrating within this 5-year period. The 5-year ACS file divides PERWT and HHWT from the 1-year files by five in order to make sums more easily interpretable as annual estimates during the five-year period. If you were to sum PERWT in the 5-year file for respondents who you identified as having moved out of Guadalupe county, your estimate will be the average annual out-migration from this county over this period.

I’m not sure what Population Estimates Program you are referring to. However, the county-to-county migration flow data released by Census are in fact derived from ACS data. However, estimates using the Public Use Microdata Sample (PUMS) file are not expected to perfectly replicate official estimates because the PUMS data are subject to additional sampling error and further data processing operations (see the 2022 ACS PUMS Accuracy of the Data report).


Thank you so much for the response! This is very helpful.

To follow up on the last point with a little more info, I have provided the links to a file layout from some recent data and the Population Estimates Program (PEP) data.


I recognize that these data sets are derived using different methodologies, but I was just curious if there was any documentation that describes how one could understand the relationship between the two (I had trouble finding anything specific).

To say this another way, creating net migration statistics for a county using the person weights, PUMA and MIGPUMA won’t match the county level net migration estimates from the PEP data because they are produced using different methodologies but I was curious about the type of interpretations you could make comparing the two to better understand migration.

If for example the 2021 1-year ACS Public Use Microdata Sample estimated a net positive domestic migration of 1,000 into County X but the PEP estimated County X’s net domestic migration for 2021 to be 1,500 that could be understood as the difference of survey methods, timing of surveys etc? And if they diverge even more heavily like 1,000 vs 10,000 or 1,000 vs -500 that could be understood as margins of errors and again the results of different survey methodologies?

I apologize if my question isn’t clear, I am effectively trying to understand how, if you analyze migration trends, using the two data sets you could explain or better understand why differences in results could occur. Are you aware of any documentation of journal articles that interrogate this relationship between the two am I making this more complicated than it needs to be haha?

Thanks again for all your help!

Eric, thank you for sharing the links to the Population Estimates Program. Your question on the PEP is a bit outside of my purview not only because IPUMS doesn’t provide any documentation on the PEP, but also because the interpretation of how differences in methodology affect estimates is an analytical question that researchers can approach in different ways.

Based on this Census training workshop presentation, estimates from the PEP and ACS have different uses. The recommendation is for the PEP to be used for “population and housing unit totals [and means] and demographic characteristics” while the ACS is for “social and economic characteristics” and “conducting statistical testing”. Unlike the ACS, the PEP does not include microdata or any social or economic detail. However, if your goal is to simply obtain a more accurate estimate of total net migration, it appears that the recommendation is to use the PEP. Since the ACS is a survey, the data include a margin of error (see the accuracy of the ACS data documentation). It is still not entirely unlikely however that the PEP’s estimate will be outside the margin of error for the ACS. As you suggest, this divergence could be due to differing methodologies that you may investigate further.


Thank you so much for taking the time and all your help! I appreciate the additional resources, this has helped tremendously.