I am trying to create a data set that will join to Los Angeles County PUMA shapefiles. In an attempt to map Limited English proficiency I am working with the variables LANGUAGED and SPEAKENG.
I tried adding the following geographies: STATEFIP, COUNTYFIP, PUMA to my variables in order to narrow the data set down to only Los Angeles County but I’m not sure this is the correct way of doing it. When I open the data set in SPSS the PUMA codes do not seem to match with the STATEFIP or COUNTYFIP I found for Los Angeles County.
A few notes that may be helpful. First, both the county (note the new naming convention: COUNTYICP) and the PUMA variables are state-dependent. This means that they must be read in combination with one of the state identification variables (see STATEFIP or STATEICP). Second, although not all counties are identifiable in public use microdata, Los Angeles county is identified in most samples since 1960. So, if you limit your data to STATEFIP==06 and COUNTYICP==0370 you will only keep observations within Los Angeles county. Finally, for more details about geographic identification with IPUMS USA data, see our Geographic Tools page. Specifically, if you are using the most recent ACS files, the 2010 PUMA definitions page should have a lot of information that will be helpful to you.
The PUMA shapefiles contain an identifier field named “GISMATCH” that is a concatenation of the STATEFIP and PUMA codes. The PUMA shapefile contains no information about counties because PUMAs do not necessarily nest within counties… Many PUMAs extend over multiple counties.
You are correct, though, to use STATEFIP and COUNTYFIP (or STATEICP and COUNTYICP) to identify which records are for Los Angeles County residents. You can use that information, together with the info on the Geographic Tools pages, to determine exactly which PUMAs lie within Los Angeles County.