Merging NSFG and ACS PUMS data to look at how MSA economic conditions affect reproductive outcomes

Working with the RDC on a proposal to access restricted NSFG contextual data to be able to link the economic conditions of the Metropolitan Statistical Area (MSA) a woman lived in at the time of interview (from the ACS PUMS) with the woman’s individual-level NSFG data. I need to prepare and send the ACS PUMS file to the RDC so that they can merge the data sets. I hope to be able to use the MET2013 ACS PUMS variable for a 2006-2015 analysis to identify and link each NSFG respondent with her corresponding MSA. Has anyone done something similar? Would appreciate any tips/suggestions. I am using STATA and planning to do a multilevel analysis of trends.

Michelle Hawks Cuellar, MSPH, PhD Candidate
Johns Hopkins Bloomberg School of Public Health Department of Population, Family and Reproductive Health

The sub-state geography variables in IPUMS USA, including MET2013 and COUNTYFIP, are derived from PUMAs (Public Use Microdata Areas, available in the variable PUMA), which are geographic areas containing at least 100,000 people. The MET2013 Description Page contains a crosswalk between PUMAs and MSAs. Since PUMAs do not always line up exactly with MSAs, there is also a file detailing the level of mismatch for each MSA. Please read the variable description for MET2013 to better understand the limitations.

Based on the information available on the CDC website, the NSFG restricted access files contain State and County FIPS codes. IPUMS USA contains the variables STATEFIP and COUNTYFIP which can be used to link to the NSFG. These can then be linked to MSA via MET2013. Be aware that since COUNTYFIP is based on PUMA, not all respondents in IPUMS USA have a valid code. It should identify the county for most people in urban areas, but in less densely populated areas, a single PUMA often contains multiple counties, so COUNTYFIP will be labeled as “000”.

An alternative to using IPUMS USA microdata is to pull county- or MSA-level summary data from a different source. For example, IPUMS NHGIS has summaries of many variables at the county and MSA level. American FactFinder, which is the Census Bureau’s data website, can also produce many tables summarizing economic information at the county or MSA level.

1 Like

Thanks Matthew for the prompt reply.

I realize that MET2013 won’t cover all PUMAs, but since I am only interested in large MSA’s I think this should be fine. I will makes sure to use the crosswalk and error sheets to understand the limitations.

And yes the NSFG restricted access files contain State and County FIPS codes. IPUMS USA contains the variables [STATEFIP] and [COUNTYFIP] which will be used to link to the NSFG. And then linked to MSA via MET2013.

I first looked at American Fact Finder hoping to be able to get the summary data I needed at the MSA level without needing to extract microdata. But the measures I need are not available.

I am using the National Survey of Family Growth (NSFG) to looks at trends in Hispanic fertility and reproductive behaviors from 2006-2017. To account for the impact of the Great Recession I will be merging NSFG individual-level data with MSA-level data from the ACS.

To be able to look at the area-level economic conditions of the MSA’s women lived in at the time they were interviewed I need to create the MSA-level measures listed in the table below. These will then be merged to NSFG individual-level data using restricted geocode variables by NSFG staff. I have looked within American Fact Finder and IPUMS and was not able to find an easy way to create the measures below for each of the MSA’s for 6 different time periods. So I extracted data from the IPUMS USA and I am now trying to figure out the most efficient way to create these measures. And would appreciate any tips/suggestions. Most of my work and research has been quantitative so I am moving slowly and need to get this to them ASAP.

Period 1 (2006-2007)

Period 2 (2008)

Period 3 (2009-2010)

Period 4 (2011-2013)

Period 5 (2013-2015)

Period 6 (2015-2017)

These are some of the measures I need to create:

|Women of reproductive age (15-44 years old)||
|Hispanic women of reproductive age (15-44 years old)||
|Proportion of population that is Hispanic or Latino|Population of Hispanic/Total Population|
|Proportion of population that is foreign-born Hispanic or Latino |Population that is Hispanic foreign born (calculated from place of birth)/Total Hispanic Population|
|Proportion of Hispanic foreign-born population that entered the US: less than 5/5-10/or more than 10 years ago |Population of Hispanic foreign-born who entered the US X years ago/All Hispanic foreign-born population|
|Median Household Income |The median divides the income distribution into two equal parts: one-half of the cases falling below the median income and one-half above the median. For households and families, the median income is based on the distribution of the total number of households and families including those with no income. |
|Median household income of households with householder Hispanic or Latino |Same as above but among households with householder Hispanic or Latino |
|Proportion of population with income below poverty level |Population with income below poverty level/total population |
|Proportion of population Hispanic or Latino with income below poverty level |Hispanic Population with income below poverty level/total Hispanic population|
|Proportion of households that received Food Stamp/SNAP benefits |Number of households that received Food Stamp/SNAP benefits÷total number of households |
|Employment/Population Ratio|Percentage of all working-age civilians who are employed |
|Labor Force Participation Rate |Proportion of the population that is in the labor force|
|Unemployment rate for total population|Number of unemployed persons divided by the labor force|
|Unemployment rate for population Hispanic or Latino|Number of unemployed Hispanic persons divided by the Hispanic labor force|
|Proportion of employed civilians aged 16+ employed in construction|Employed civilians aged 16+ employed in construction/All employed civilians aged 16+|
|Fertility|Aggregate number of women who had a birth in the past 12 months in a specified category, per 1,000- women to calculate Total Fertility Rate |
|Health Insurance coverage |Aggregate of dichotomous series of questions about 8 types of health insurance coverage|

Thanks!

Regarding the periods, you can download all IPUMS ACS one-year samples from 2006 to 2017, and subset the data for analysis of each period. For Periods 4-6 you could instead use 3-year samples, if you prefer.

To calculate aggregate statistics at the MSA level, just restrict the sample to a particular value of MET2013 (or use it as a grouping category) in your analysis.

You should be using PERWT (person weight) to weight most of your outcome measures. For household-level statistics, use HHWT. If you are using 1-yr samples and combining multiple years, you will need to divide the weights by the number of years to get correct population totals.

Regarding measures:

Take a look at the following variables in IPUMS-USA:

AGE

SEX

HISPAN

BPL

NATIVITY

HHINCOME

POVERTY

FOODSTMP

EMPSTAT

LABFORCE

IND1990

OCC2010

FERTYR

Health Insurance variables

Finally picking this analysis back up. Having a hard time calculating aggregate statistics at the MSA level for Hispanic women of reproductive age.

New to stata and quantitative analysis and it is my first time using the ACS…

One way to calculate aggregate statistics in Stata is by using the collapse command. Specifically, you could use some variation of the following code:

Generate reproductive age indicator (with X<Y)

gen rep_age = 1 if age>X & age<Y

Collapse data to report the mean income (INCTOT) of hispanic (HISPAN!=0) women (SEX==2) of reproductive age (rep_age==1), weighted with PERWT, and at the metropolitan area (MET2013).

collapse (mean) inctot if rep_age==1 & hispan!=0 & sex==2 [w=perwt], by(met2013 year)

I’m showing the code for the mean of income as an example. In practice, you can insert any number of variables in for INCTOT in the collapse command above and calculate aggregated statistics for each of them. You an also calculate statistics other than the mean, see the collapse command documentation for the full list of capabilities.

It also sounds like you are looking for a crosswalk table between MSAs and counties. If you are using the METAREA variable (for pre-2012 ACS samples) this page includes such a table. If you are using the MET2013 variable (for post-2011 ACS sample) this US Census Bureau page includes links to such tables. The Excel files in the top section, under “Core Based Statistical Areas…” should be what you are looking for.

I hope this helps. If you have any additional questions, please feel free to email ipums@umn.edu.

Thanks Jeff for your help and the Sata code! Really appreciate your help and prompt response. Working on calculating these aggregate measures this afternoon and your tips will come in very handy.

Michelle