Identifying Same-Sex Couples in ACS Data from 2005-2017

Hello. My name is Charlie. I am an undergraduate student in the department of economics at the University of Maryland, College Park. I am interested in investigating gay locational patterns from 2005-2017. I am, however, new to research and not the best at coding in STATA. I realize that from 2013 and onward there is a variable SSMC that identifies the presence of a same-sex married couple. But because my interval of interest stretches back to 2005, I need a way to break down the data set to just include same sex couples, both married and unmarried, from 2005 to 2017. My goal right now is to create summary statistics depicting their household characteristics. I have used the command “keep if sex==sex_sp,” but I am uncertain as to whether this gets the job done or more coding is required to narrow the data set down further. I am also not sure how to create summary statistics and graphs on STATA for a large panel data set like this, with many discrete observations over several years. Any help or advice on this matter would be greatly appreciated. Thanks.


It sounds like you are on the right track. By using the IPUMS family interrelationship variables, the attach characteristics tool, and the Stata command “keep if sex==sex_sp” (as you’ve already done), your data set will include only individuals who are linked to a same-sex partner. This includes both married and cohabiting relationships. One detail you’ll need to think about is whether you want your data to be at the individual level or at the partner-pair level. As of now, based on what you’ve described, your data will include an observation for each individual within a partner-pair (i.e., each individual within a partner-pair will show up twice in your data). Depending on your research questions, you may want to limit your data to only include one observation for each partner-pair.

There are a number of ways to limit your data to only include one observation for each partner-pair. One way is to create a household-family unit ID variable and only keep one partner-pair per family unit.

First, concatenate the household ID (SERIAL) and the family unit (FAMUNIT) variables.

“egen family_id = concat(serial famunit)”

Next, identify the number of distinct observations of family_id

“by family_id, sort: gen nvals = _n == 1”

Finally, only keep one observation per partner-par

“keep if nvals==1”


Thanks so much for your help! STATA is not my friend, so I greatly appreciate your guidance pertaining to the coding. I believe for the sake and scope of my analysis, I likely want to look at the household level, as I am most concerned with gay coupled households. As such, I have implemented your advice.

I do have one last question. Based on the literature I have gathered on gay demographic trends, it appears that miscoding errors from heterosexual couples often skew numbers for the population of gay couples, inflating them greatly. I’m just wondering if you have any advice on perhaps identifying potential miscoding errors.

Thanks again,

I’m glad I can help. Regarding verifying spousal links, we make the SPRULE variable available as part of the IPUMS USA family interrelationship variables. This variable describes the nature of the identified link identified by SPLOC—and used by the Attach Characteristics tool. You can use the SPRULE variable to clean out any potentially miscoded links.


Thanks again for your response. I would just like some quick clarification. What values would a household with a same sex couple have? How would I go about eliminating those for which there exists a discrepancy? I guess I’m trying to figure out how exactly SPRULE works. I understand that it provides the method by which a spousal link was made. I do not understand, however, what particular values for SPRULE may suggest that there is miscoding error. Moreover, could I use any particular coding sequence to prune the dataset of likely errors?

Thanks so much for your time,

None of the values of SPRULE explicitly suggest any miscoding error. Rather, this variable allows you as the researcher to make judgements based on the “quality” of the links reported by SPLOC. For example, you could decide to exclude all “fifth level links” in your data. This, of course, will further reduce your sample of same-sex couples. However, it could help your analysis in providing a useful robustness check on your results.