IPUMS PMA Participant ID for Longitudinal Panel

William_Rudgard · February 23, 2024, 10:31am

To:IPUMS_PMA



Dear IPUMS, We would like to use the longitudinal data from Kenya PMA 2019–2022. However, once we download the data, we can’t find the common ID variable that identifies participants across phases. Is there a way you can support with this? Any form of help with this would be of great help.

Isabel_Pastoor · February 23, 2024, 10:51pm

There are two ways that you can use longitudinal data from IPUMS PMA. The easiest is to simply request a linked file from our website; I am including a brief video tutorial on how to get a longitudinal IPUMS PMA extract.

The linking key is FQINSTID; you can also use this variable to link women’s records across time. In the original PMA data (e.g., not the harmonized IPUMS version), this variable is called FQmetainstanceID.

Yingyi_Lin · June 3, 2024, 7:22pm

Hello Isabel,

Thank you so much for this information. I also have a question related to the longitudinal ID. I downloaded PMA longitudinal data from IPUMS and all observations have FQINSTID.

I wonder how can I identify the number of times that an individual is followed? I count the number of times that the same and unique (unique to each country) FQINSTID has repeated. However, I find that this ‘times’ variable I created does not correspond with the PHASE variable associated with the PMA longitudinal data. I wonder whether you have any insights into this issue?

Thank you!
Yingyi

Etienne_Breton · June 5, 2024, 2:18pm

Hi Yingyi,

The answer to your question depends on whether the data extract you created has a long or a wide format. In both cases, however, you may choose to visually inspect women’s participation over three phases or create a variable denoting their participation at each survey phase. I’ll provide a few suggestions using code from Stata. Let’s assume you’re limiting your sample only to female respondents and that you generated an extract that includes observations from all three phases of the longitudinal panel. Let’s also assume that by “the number of times that an individual is followed” you mean “the number of time an individual completes or partly completes an interview”.

If using the wide format, you may notice that variables have suffixes (_1, _2 or _3) denoting their phase. You may then consider the variables RESULTFQ_1, RESULTFQ_2 and RESULTFQ_3. For example, in Stata, you could try to visually inspect this question using the commands:

sort country fqinstid
browse fqinstid resultfq_1 resultfq_2 resultfq_3

This yields a spreadsheet where each row represents a woman, with the three last columns representing her survey outcomes in all three phases. You may also chose to create your own variable denoting whether a woman was interviewed, completely or partly, for all or some of the phases. This variable would have seven categories: women interviewed in phases 1-2-3, women interviewed only in phases 1-2, women interviewed only in phases 1-3, women interviewed only in phases 2-3, women interviewed only in phases 1, women interviewed only in phases 2, women interviewed only in phases 3. In Stata, one quick way to accomplish this would be to recode the values of RESULTFQ_1, RESULTFQ_2 and RESULTFQ_3 and then concatenate the recoded variables:

recode resultfq_1 (1=1) (5=1) (else=.), gen(resultfq_1_recode)
recode resultfq_2 (1=2) (5=2) (else=.), gen(resultfq_2_recode)
recode resultfq_3 (1=3) (5=3) (else=.), gen(resultfq_3_recode)
egen phase_participation=concat(resultfq_1_recode resultfq_2_recode resultfq_3_recode)

This yields the seven categories highlighted above in a different order, with “.” denoting phases when women weren’t interviewed. For instance, the value “123” means that the respondent was interviewed in all phases, whereas the value “1.3” means the respondent was interviewed only in phases 1 and 3. Of course you may choose to label this variable and its values, or change their order, however suits you.

If using the long format, you’ll need to operate differently because variables don’t have a suffix denoting which phase they belong to. This information can instead be found in another variable, PHASE. You could choose to simply visually inspect to survey outcomes by sorting the dataset and viewing the variable RESULTFQ in tandem with PHASE and FQINSTID. In Stata, this could look like:

sort country fqinstid phase
browse country fqinstid phase resultfq

This yields a spreadsheet where each row represent an observation of a woman at a given survey phase. You could also try to create a variable denoting the seven categories of participation I highlighted above. In Stata, one quick way to do this would be to concatenate a string version of the variable PHASE:

tostring phase, gen(phasestring)
bysort fqinstid (phase): generate phase_participation = phasestring[1] + phasestring[2] + phasestring[3]

This concatenated variable has a slightly different format than the concatenated variable obtained using the wide format. In this case, a value of “123” would still mean that the woman was interviewed in all three phases, but the value 13 would now denote that the woman was only interviewed in phases 1 and 3: that is, phases for which women weren’t interviewed simply do not appear. Once more, you may choose to label this variable and its values, or change the ordering, however suits you.

Yingyi_Lin · June 6, 2024, 5:42pm

Hello Etienne,

Thank you so much for your detailed response! I was able to replicate your Stata example in R (with PMA long data format) and solved the question I had.

Thanks a ton, again!
Yingyi

Topic		Replies	Views
Panel dimension and personID PMA GLOBAL HEALTH	1	352	April 15, 2021
International panel / repeated measures data from IPUMS? GLOBAL HEALTH	2	203	November 17, 2022
Unable to download the IPUMS PMA data in the wide format GLOBAL HEALTH	3	109	March 9, 2024
Panel Status of Mexico Censuses INTERNATIONAL	4	92	May 17, 2024
Time-Invariant variables? USA	4	348	July 13, 2022

IPUMS PMA Participant ID for Longitudinal Panel

Related topics