I successfully downloaded and decompressed my data. When I run the do file, I get an error code because one of the variables in the dataset is defined twice.
I checked the data dictionary and it looks like this variable appears in two different columns. How should I correct this?
The first recommendation when encountering this kind of problem is to recreate the extract from scratch. Occasionally, revising and resubmitting fairly old extracts has lead to problems of doubled variables, and creating a new extract has solved the problem. If you continue to experience this problem, please send us an email at ipums@umn.edu
I am having a similar issue. I just downloaded my IPUMS CPS extract, and all of the variables are doubled (e.g. if I selected YEAR and STATEFIP to download, it’s creating a dataset with YEAR_1 and YEAR_2 that are the same and STATEFIP_1 and STATEFIP_2 that are the same). I downloaded it again, and it is the same.
It sounds like you have downloaded a longitudinal extract of CPS data from IPUMS CPS. In longitudinal files, respondents are linked over a period of one year, which allows users to easily conduct analyses of changes over time. Each variable is measured twice—once in time period 1, and again in time period 2, one year after time period 1. For example, EMPSTAT_1 is the respondent’s employment status in time period 1. EMPSTAT_2 is the respondent’s employment status in time period 2.
Some variables are expected to change for many respondents over the one year period, such as employment status, hours worked, wage variables, or marital status. Year will change for all individuals in the longitudinal file (I took a look at your most recent longitudinal extract from IPUMS CPS and confirmed that YEAR_1 is not equal to YEAR_2 for any individual).
Other variables won’t change for many people. STATEFIP, for example, will never be different in time period 1 and time period 2 (I confirmed this in your most recent extract as well). This is because the CPS samples dwellings, rather than individuals or households. Whichever individuals live in the sampled dwelling during the CPS observation period are included in the CPS. If someone moves during the CPS, they are not followed by the CPS to a different dwelling. The new residents of the sampled dwelling take on the CPS identifiers of the previous residents. These identifiers are CPSID (for households) and CPSIDP (for individuals). CPSIDP is used to link individuals over time in the longitudinal CPS data files provided by IPUMS.
In your most recent longitudinal extract from IPUMS CPS, I am not seeing the issue you describe, where the value of each variable in time period 1 is the same as in time period 2. I am only seeing this occur for variables where this is expected (e.g., STATEFIP, CPSIDP, CPSID, ASECFLAG, and others). I am seeing changes in variables between time periods when expected (e.g., AGE, NCHILD, LABFORCE, and others).