Panel data - MEPS (2010-2020)


I would like to use MEPS (2010-2020) data to study longitudinal associations between A and B.

I understand MEPS can be an unbalanced panel data, but it seems like it was not a panel data, but just cross-sectional one, for earlier years (2010-2017). The variable ‘mepsid’ shows that all participants in the earlier surveys (2010-2017) were surveyed only once, while only participants in later surveys (2018-2019) participated in surveys at multiple timepoints.

Am I missing anything, or was MEPS not a panel data until 2018?
Please correct me if I am wrong. Thanks.


The MEPS has been a panel survey since the first cohort of participants entered the survey in 1996. You can read more about the MEPS panel design on this IPUMS user guide page. In 2018, the MEPS underwent a redesign that sought to reduce respondent burden by decreasing the frequency that certain questions were asked. However, this redesign does not affect the ability to link participants across their appearance in the panel.

As a test, I tried and was able to link respondents in 2010 to their 2011 record using MEPSID. I reviewed your data extract and followed the same general parameters, selecting person level variables and a Stata formatted file where observations are rectangular on the person level. I added the 2010 and 2011 samples to my extract. After downloading the extract, decompressing it, and running the .do file, I ran the following commands to identify duplicate (i.e. linked) values of MEPSID:

sort mepsid year
by mepsid: gen dup = cond(_N==1,0,_n)

When tabulating by dup, dup = 0 refers to respondents whose MEPSID value is observed only once and dup > 0 to those observed more than once. More specifically, dup = 1 identifies the first time a specific MEPSID value is observed and dup = 2 identifies the second time that this value is observed. Tabulating by dup and year, I find 18,127 observations with dup = 0 and 14,719 with dup = 1 in 2010. These figures make sense since about half of 2010 respondents are in their second year in the panel and are not observed in 2011. The other half with dup = 1 can be linked to observations with the same MEPSID with values of dup = 2 in 2011.

I hope this helps you figure out what is happening in the data and that you are able to link your observations. If you are still having trouble, please share more details on how you are linking your observations and I will take a closer look.

1 Like

Thanks for the response. I tried your Stata codes and understood what you meant.

But it’s still not clear why all the participants before 2018 had been surveyed only once or twice.
For example, I ran the following codes with MEPS data between 2010 to 2021:
sort mepsid year
bysort mepsid: gen num=1
bysort mepsid: gen num_cum=sum(num)
bysort mepsid: gen num_total= num_cum[_N]
tab num_total
tab year num_total

The last command indicated all participants who were surveyed between 2007 and 2017 were surveyed only once or twice, while all participants surveyed between 2018 to 2021 were surveyed multiple times (1-4 times).

Is this correct? I mean, the data can be unbalanced panel, but it looks weird that all panel participants in 2007-2010 were surveyed only once or twice…

Each person record should typically be able to be linked across exactly two years of the MEPS. What you are seeing is a result of the temporary introduction of additional sequential panels in the MEPS used to accommodate lower response rates during the COVID-19 pandemic. The panels (i.e. cohorts) that entered the survey in 2018 and 2019 remained in the MEPS for an additional two years and therefore can be linked across four different annual observations, or nine survey rounds. This change was temporary as panels entering the survey after 2019 will again be only sampled over two years just as before the pandemic. As a result, there are three different panels present in the MEPS in 2020, four in 2021, and three again in 2022 (data yet to be released on IPUMS). With the 2023 data release, you should expect that the MEPS will return to having two panels in any given year. For more information, you may refer to this article which further discusses the impacts of the pandemic on the MEPS.

You also mention that there are panel participants who are only surveyed once. I wanted to clarify that respondents who only appear in the data once are those who only responded to the survey in one of the two years that they were in the panel. The typical response rate to the MEPS is around 95%, though this rate fell significantly during the height of the COVID pandemic. More information about response rates and reasons for non-response are provided in the 2021 MEPS methodology Report. I hope my explanation helps clarify what is happening in the data.

1 Like

I wanted to also mention that while respondents can be typically linked across two annual person records, respondents are in fact interviewed five times in a typical panel within this two year period. By requesting a data extract that is rectangular on round, hierarchical, or wide, and adding round level variables, respondents can be linked across up to five rounds (or across nine rounds for a COVID-era panel). You can select between these different formats on the extract options page. This guide further explains the types of data formats that users can request, while these video tutorials can be a helpful visual guide.

1 Like

Thank you so much for your explanations!!!