Hi everyone, I am interested in examining population growth in rural PUMA’s over time using data from the 1990 Census, 2000 Census, and the 2010 ACS. I understand there a lot of complications with capturing urban/rural populations in general (and there are plenty of questions on the forum about this), as well as observing trends in PUMAs over time since they are redrawn. My hope is to use the CONSPUMA variable to obtain consistent PUMAs over time, and then apply the 1990 URBAN variable to the same PUMAs in subsequent years (e.g., a rural PUMA in 1990 is also rural in 2000 and 2010). I think this will allow me to observe population changes in 1990-defined PUMAs across time? I’m curious if anyone sees any problems with this approach? Thanks so much!
ou are certainly correct about the numerous challenges underlying the analysis of rural/urban population growth over time. Your approach sounds reasonable, however, I do have a few comments. First, I’m not exactly sure how you are planning on applying the 1990 URBAN variable to 2000 and 2010 samples. In 1990 the URBAN variable is defined by using the universe statement for the FARM variable, which included only rural households. This sort of universe statement is not the same for the FARM variable in subsequent years. Second, I want to highlight some limitations with the CONSPUMA variable. In particular, there is a break in the series provided by the “consistent PUMA” variables available in IPUMS. The CONSPUMA variable is best suited for consistency from 1980 through 2000. The CPUMA0010 variable is best suited for consistency from 2000 onward.
My suggestion is to (if you are able) study population growth in non-metropolitan areas, rather than rural areas. You can do this with the METRO variable. Metropolitan status is a bit more comparable (at least in concept) over time, than compared to rural-urban definitions. This slight change in the aim of your project may make your analysis more straightforward and may simplify your results.
Thanks, Jeff! Super helpful. I see what you mean about the break in CONSPUMA, so I can’t rely on that to compare PUMAs over the time period from 1990 to 2010 even generally. Re: rural or nonmetro, I had considered using the metro variable, but that seems to be available for 2000 and 2010? Is there any variable for rural or nonmetro that can be used across 1990 to 2010? Just for some context, I’m trying to estimate the impact of a policy on rural populations across those time periods, which is why I’m interested in those datasets specifically. Thanks so much for your insight!
METRO is available for most years in IPUMS USA. In 1990, it’s not available for the 5% “state” sample, but it is available for the 1% “metro” sample. We have some plans to provide new variables for distinguishing rural PUMAs (e.g., population density), but it may be a year or more before we extend that info back to 1990 samples, so I agree with Jeff that METRO is the most suitable variable for distinguishing “rural” populations in microdata for 1990-2010.
Note that METRO codes “nonmetropolitan” populations in two ways. Residents of PUMAs that are entirely outside of metro areas are assigned METRO = 1. Residents of PUMAs that straddle metro area boundaries are assigned METRO = 0. I think it may be the case that half or more of the nonmetro population resides in “straddling” PUMAs, so you’d need to make a decision about whether you want to include METRO 0 codes as part of the “rural” population for your analysis.
Lastly, if the aims of your analysis could be achieved with census summary tables (rather than with microdata), you could use IPUMS NHGIS to obtain summary data for either rural or nonmetro populations, exactly defined, 1990-2010, without the limitations of PUMA geography.
This is great. Thanks, Jeff and Jonathan.
Jeff, I am working with the 5% sample because I need to know state (I am examining the impact of state policy on rural population growth) so now that makes sense why I didn’t see it.
Jonathan, I’m ok with the straddle factor on the metro area - not a deal breaker - but now I understand more what that is. Thank you.I also considered using the summary tables - good suggestion - but I’m also worried to lose a lot of power if I go that route. I so look forward to the new variables …that’s great news.
So many trade-offs to consider! One final option I am contemplating … we have an RDC here, so I think I can get that nonmetro variable at the county level over time - at least I have seen other publications doing metro/nonmetro counties, so it seems that the variable is available at that level. Is that reasonable to assume?
So given all your feedback, I think I have to make a decision of whether to go smaller (restricted data) or bigger (summary tables). Does that sound about right?
First, yes, in the RDC, you could definitely get every person’s metropolitan status and also their urban/rural status. I’m not sure if those are already coded, but I do know that the census block of residence is available in the RDC, and you can crosswalk from the block to any other geographic info.
I’d say you have more than just the two options going forward. In addition to using the RDC or summary data, there are a couple ways you could still use public microdata:
Use the 1990 1% sample and limit your analysis to PUMAs that lie entirely in a single state, in which case the state of residence is identified. In fact, I’m pretty sure that the PUMAs that straddle state boundaries are almost entirely metro PUMAs (the point being to keep whole metro areas intact). So most, if not all, nonmetro PUMAs should have an identifiable state.
Jonathan, did you just do my dissertation for me? Seriously though, those suggestions are extremely helpful. Every time I think I’m getting somewhere, I come across a major obstacle. I’ll look into both of those options and see if maybe I can avoid having to through the whole restricted data access (at least for now). Thank so much.