Hi Yingyi,
The answer to your question depends on whether the data extract you created has a long or a wide format. In both cases, however, you may choose to visually inspect women’s participation over three phases or create a variable denoting their participation at each survey phase. I’ll provide a few suggestions using code from Stata. Let’s assume you’re limiting your sample only to female respondents and that you generated an extract that includes observations from all three phases of the longitudinal panel. Let’s also assume that by “the number of times that an individual is followed” you mean “the number of time an individual completes or partly completes an interview”.
If using the wide format, you may notice that variables have suffixes (_1, _2 or _3) denoting their phase. You may then consider the variables RESULTFQ_1, RESULTFQ_2 and RESULTFQ_3. For example, in Stata, you could try to visually inspect this question using the commands:
sort country fqinstid
browse fqinstid resultfq_1 resultfq_2 resultfq_3
This yields a spreadsheet where each row represents a woman, with the three last columns representing her survey outcomes in all three phases. You may also chose to create your own variable denoting whether a woman was interviewed, completely or partly, for all or some of the phases. This variable would have seven categories: women interviewed in phases 1-2-3, women interviewed only in phases 1-2, women interviewed only in phases 1-3, women interviewed only in phases 2-3, women interviewed only in phases 1, women interviewed only in phases 2, women interviewed only in phases 3. In Stata, one quick way to accomplish this would be to recode the values of RESULTFQ_1, RESULTFQ_2 and RESULTFQ_3 and then concatenate the recoded variables:
recode resultfq_1 (1=1) (5=1) (else=.), gen(resultfq_1_recode)
recode resultfq_2 (1=2) (5=2) (else=.), gen(resultfq_2_recode)
recode resultfq_3 (1=3) (5=3) (else=.), gen(resultfq_3_recode)
egen phase_participation=concat(resultfq_1_recode resultfq_2_recode resultfq_3_recode)
This yields the seven categories highlighted above in a different order, with “.” denoting phases when women weren’t interviewed. For instance, the value “123” means that the respondent was interviewed in all phases, whereas the value “1.3” means the respondent was interviewed only in phases 1 and 3. Of course you may choose to label this variable and its values, or change their order, however suits you.
If using the long format, you’ll need to operate differently because variables don’t have a suffix denoting which phase they belong to. This information can instead be found in another variable, PHASE. You could choose to simply visually inspect to survey outcomes by sorting the dataset and viewing the variable RESULTFQ in tandem with PHASE and FQINSTID. In Stata, this could look like:
sort country fqinstid phase
browse country fqinstid phase resultfq
This yields a spreadsheet where each row represent an observation of a woman at a given survey phase. You could also try to create a variable denoting the seven categories of participation I highlighted above. In Stata, one quick way to do this would be to concatenate a string version of the variable PHASE:
tostring phase, gen(phasestring)
bysort fqinstid (phase): generate phase_participation = phasestring[1] + phasestring[2] + phasestring[3]
This concatenated variable has a slightly different format than the concatenated variable obtained using the wide format. In this case, a value of “123” would still mean that the woman was interviewed in all three phases, but the value 13 would now denote that the woman was only interviewed in phases 1 and 3: that is, phases for which women weren’t interviewed simply do not appear. Once more, you may choose to label this variable and its values, or change the ordering, however suits you.