Hello IPUMS community,
I am a student and i am using the IPUMS Database for my Bachelors thesis. I am using the 2018 Database only. Unfortunately I am quite overwhelmed on how exactly to use the Variables (PUMA, STATEFIP, COUNTYFIP & City…)
If I want to know, e.g. how many people of one race live in a certain State or city, how do I do that in Stata? I don´t understand, how to combine the variables and which one exactly to use.
I hope someone can help out in any way possible, thank you
These tutorials and exercises should help you practice using IPUMS in Stata. The exercises provide sample code that you can tailor to your specific needs; specifically, Exercise 1 and Exercise 2 for the USA project using Stata will be most useful for you. Make sure you use sample weights for population estimates since the ACS is a sample survey (more on using sample weights in this blogpost).
Also, if you’re only looking for basic summary statistics, you might not need to use microdata from IPUMS USA. You can find a wide range of summary tables for states, counties, cities or PUMAs from IPUMS NHGIS.
Note that to get summary tables for cities, you should select the “Place” geographic level (which includes both incorporated and unincorporated places).
And if you want to load IPUMS NHGIS data into Stata, select the “fixed width” format when you make your data request. That format comes with a control file to support loading the data into Stata.
I want average population characteristics of the neighborhood of ACS 2019 respondents. Is PUMA the lowest-level geography available for ACS respondents? Is there a place to get average PUMA characteristics to match on or is it best to just summarize over the sample by PUMA?
Yes, the PUMA is the lowest-level geography identified in public-use ACS microdata. To get PUMA-level characteristics, you can summarize microdata by PUMA or you can also get PUMA-level data from ACS summary tables, which are available through IPUMS NHGIS. The summary tables would be slightly more accurate, being based on a somewhat larger sample.
IPUMS USA provides one variable, DENSITY, that describes the average of local tract characteristics (population densities) among each PUMA’s residents. You could apply a similar strategy to get PUMA-level summaries of tract characteristics, but I’m not sure how often that would be more relevant than a simpler whole-PUMA summary. Another strategy is offered in a research article (Leyk, Nagle & Buttenfield 2013), which allocates household records in public microdata among the census tracts within each PUMA. Alternatively, you could apply to get access to complete geographic information about ACS respondents through a Federal Statistical Research Data Center.
Thanks, Jonathan! Are 2019-ish PUMA summary files available through NHGIS. I think I saw only 2010.
All of the ACS datasets in NHGIS include PUMA-level data, from the 2009 5-year through 2021. The PUMAs may be identified as “2010 PUMAs” because PUMA definitions are benchmarked to a particular census year.