Is there any easy way to pull county data out of the new 2012 5 year PUMS

I just began working with the 2012 5 year PUMS State of California. There is one variable for PUMA. Any thing before 2012 has the 2000 PUMA and 2012 has the “2012” PUMA. For the 2012 data the first 2 digits refer to the County. Pre 2012 is not like that. And there is no longer a county variable.

Pulling county data out of the state file is impossible, unless I know all of the 2000 PUMAs in the county.

Doing PUMA analysis on the 5 year data is impossible, because there is an imperfect relationship between 2000 and 2012 PUMAs.

Am I right in my assumptions, or am I missing something?


All of your assumptions are correct. IPUMS-USA is currently working on ways to overcome some of these boundaries, however, as you pointed out, the imperfect relationship between the PUMAs used in 2012 and the PUMAs used in the earlier years is very limiting.

You may be able to consistently identify some counties in the 2012 5-year file, provided that the counties are identifiable using both the 2000 and “2012” PUMAs. I would recommend extracting the 2012 1-year file as well as a 1-year file that uses the 2000 PUMAs (say, 2009), making sure to select both the PUMA and COUNTY variables. You can then create a crosswalk between PUMA and COUNTY for both the 2000 and “2012” PUMAs (since you are only interested in California you can drop all cases outside of California. If you were interested in other states you would need to use the STATEFIP or STATEICP variable in the crosswalk as well since PUMA codes depend on state). You could then apply these crosswalks to the multi-year file, using MULTYEAR to distinguish between respondents from 2012 who should get the 2012 crosswalk, and everyone else. You can then keep the counties that are identified throughout the entire sample.

I hope this helps.