I have created an extract of data, reading it in Stata. However, I do not understand the meaning of “_1” describing the variables such as “age_1” and “age_2”. The data is different between columns and I don’t think it is simply lagged values as some age values differ by more than 1, and some are the same.
It looks like you’ve downloaded a longitudinal file of ASEC data from IPUMS CPS. A longitudinal file includes responses from two ASEC samples for each respondent. The ASEC happens every year, and CPS respondents can appear in a maximum of two different ASEC samples during their observation periods in the CPS. You can see our documentation about the rotation pattern and sample design of the CPS for more information. The longitudinal ASEC files are designed to help researchers study changes within individuals and families over time.
In the longitudinal file, each respondent’s variables from their first ASEC response have a _1 suffix in the variable name. Each respondent’s variables from their second ASEC response have a _2 suffix in the variable name. This means, for example, that AGE_1 reports the respondent’s age at the time of their first ASEC interview, and AGE_2 reports the respondent’s age at the time of their second ASEC interview. You’ll notice that these values are usually 0-2 years apart. Some variables aren’t expected to change during the CPS observation period, for example, race and sex, so you’ll see fewer differences over time in these variables.
Thank you so much for your help, this explains my doubt!