is it possible to extract ipums-cps data with some cps original technical variables such as "phseq" and "pppos"?

To match IPUMS-CPS with other data, I need original “pppos” and “phseq” variables. I am wondering whether it is possible to extract those variables; if not, whether any IPUMS-CPS variables are identical with “pppos” and “phseq” variables.

These original CPS variables are not available in the IPUMS-CPS data. However, since IPUMS and NBER files share the same sort order, you can sequentially merge (i.e. first observation to first observation, second to second, etc.) your IPUMS-CPS extract with the corresponding unsorted CPS data file from NBER. This will attach the variables ph-seq and pppos to your IPUMS-CPS extract. Please verify your merge by comparing values of sex, age, and race before and after the merge.

Hope this helps.

I am also needing “pppos” and “phseq” or their equivalent, but was surprised to see this response saying that they aren’t available.

I was under the impression that IPUMS-CPS variable names “hseq” is the same as the original ph_seq, and that IPUMS-CPS “pernum” is equivalent to the original “pppos” variable. Can you confirm whether or not that is accurate?

Thanks for the help,
Adam

There are no IPUMS variables that directly correspond to PPPOS or PH_SEQ. However, assuming you are interested in these variables for linking purposes, there are other linking keys in the original data that do link to IPUMS variables. This blog post details how you can link NBER/Census microdata files with IPUMS CPS files using the variables HRHHID, HRHHID2, HUHHNUM, HRSAMPLE, HRSERSUF, and LINENO. Using these, you can also pull in PPPOS and PH_SEQ from the Census files if you need these for other purposes.

Thanks for the quick response, I really appreciate it. This is relevant for a project I’m currently working on (and changes the workload involved significantly) so I’m trying to make sure I understand what is happening in the data.

With that goal, I just downloaded the ASEC 1996 sample from NBER and from IPUMS-CPS, sequentially merged them, and verified the merge with age, race and sex. At this point, I compared the IPUMS variable “hseq” to the NBER “ph_seq” and they were an exact match. Comparing the IPUMS variable “pernum” to the NBER “pppos” variable, they also match (with pernum=pppos-40) for every single observation.

I am confused at why you are saying there are no equivalent variables in IPUMS. Can you share what source you are getting that information from? Maybe this only works for some years and 1996 (which I randomly picked) just happens to be one of them? If that is the case, is there any existing documentation on which years can be merged this way?

Thanks again for all the help, I’m just trying to get to the bottom of this since I will (hopefully) be using this for a project very soon.

I’ve discussed this with the IPUMS CPS team, and you’re right about HSEQ and PH_SEQ being a direct correspondence. Sorry for the confusion, I wasn’t familiar with this variable. On the other hand, although PERNUM may match up (with a shift) in that particular sample, and this may be true in other samples as well, in fact PERNUM is not based on PPPOS. It is a unique (within-household) person number generated by IPUMS CPS. If you need to use PPPOS for linking, I still suggest pulling this in from the NBER data, as in my earlier post.