IPUMS PMA Participant ID for Longitudinal Panel

To:​IPUMS_PMA

Dear IPUMS, We would like to use the longitudinal data from Kenya PMA 2019–2022. However, once we download the data, we can’t find the common ID variable that identifies participants across phases. Is there a way you can support with this? Any form of help with this would be of great help.

There are two ways that you can use longitudinal data from IPUMS PMA. The easiest is to simply request a linked file from our website; I am including a brief video tutorial on how to get a longitudinal IPUMS PMA extract.

The linking key is FQINSTID; you can also use this variable to link women’s records across time. In the original PMA data (e.g., not the harmonized IPUMS version), this variable is called FQmetainstanceID.

Hello Isabel,

Thank you so much for this information. I also have a question related to the longitudinal ID. I downloaded PMA longitudinal data from IPUMS and all observations have FQINSTID.

I wonder how can I identify the number of times that an individual is followed? I count the number of times that the same and unique (unique to each country) FQINSTID has repeated. However, I find that this ‘times’ variable I created does not correspond with the PHASE variable associated with the PMA longitudinal data. I wonder whether you have any insights into this issue?

Thank you!
Yingyi

Hi Yingyi,

The answer to your question depends on whether the data extract you created has a long or a wide format. In both cases, however, you may choose to visually inspect women’s participation over three phases or create a variable denoting their participation at each survey phase. I’ll provide a few suggestions using code from Stata. Let’s assume you’re limiting your sample only to female respondents and that you generated an extract that includes observations from all three phases of the longitudinal panel. Let’s also assume that by “the number of times that an individual is followed” you mean “the number of time an individual completes or partly completes an interview”.

If using the wide format, you may notice that variables have suffixes (_1, _2 or _3) denoting their phase. You may then consider the variables RESULTFQ_1, RESULTFQ_2 and RESULTFQ_3. For example, in Stata, you could try to visually inspect this question using the commands:

sort country fqinstid
browse fqinstid resultfq_1 resultfq_2 resultfq_3

This yields a spreadsheet where each row represents a woman, with the three last columns representing her survey outcomes in all three phases. You may also chose to create your own variable denoting whether a woman was interviewed, completely or partly, for all or some of the phases. This variable would have seven categories: women interviewed in phases 1-2-3, women interviewed only in phases 1-2, women interviewed only in phases 1-3, women interviewed only in phases 2-3, women interviewed only in phases 1, women interviewed only in phases 2, women interviewed only in phases 3. In Stata, one quick way to accomplish this would be to recode the values of RESULTFQ_1, RESULTFQ_2 and RESULTFQ_3 and then concatenate the recoded variables:

recode resultfq_1 (1=1) (5=1) (else=.), gen(resultfq_1_recode)
recode resultfq_2 (1=2) (5=2) (else=.), gen(resultfq_2_recode)
recode resultfq_3 (1=3) (5=3) (else=.), gen(resultfq_3_recode)
egen phase_participation=concat(resultfq_1_recode resultfq_2_recode resultfq_3_recode)

This yields the seven categories highlighted above in a different order, with “.” denoting phases when women weren’t interviewed. For instance, the value “123” means that the respondent was interviewed in all phases, whereas the value “1.3” means the respondent was interviewed only in phases 1 and 3. Of course you may choose to label this variable and its values, or change their order, however suits you.

If using the long format, you’ll need to operate differently because variables don’t have a suffix denoting which phase they belong to. This information can instead be found in another variable, PHASE. You could choose to simply visually inspect to survey outcomes by sorting the dataset and viewing the variable RESULTFQ in tandem with PHASE and FQINSTID. In Stata, this could look like:

sort country fqinstid phase
browse country fqinstid phase resultfq

This yields a spreadsheet where each row represent an observation of a woman at a given survey phase. You could also try to create a variable denoting the seven categories of participation I highlighted above. In Stata, one quick way to do this would be to concatenate a string version of the variable PHASE:

tostring phase, gen(phasestring)
bysort fqinstid (phase): generate phase_participation = phasestring[1] + phasestring[2] + phasestring[3]

This concatenated variable has a slightly different format than the concatenated variable obtained using the wide format. In this case, a value of “123” would still mean that the woman was interviewed in all three phases, but the value 13 would now denote that the woman was only interviewed in phases 1 and 3: that is, phases for which women weren’t interviewed simply do not appear. Once more, you may choose to label this variable and its values, or change the ordering, however suits you.

Hello Etienne,

Thank you so much for your detailed response! I was able to replicate your Stata example in R (with PMA long data format) and solved the question I had.

Thanks a ton, again!
Yingyi