Referring to your answer about merging IPUMS to raw March 2011 CPS data - problem identified


Relying on your answer to the questions referenced above, I have attempted to merge IPUMS variables relating to the HIU to the CPS data (not NBER version - variable names as specified by Census) after sorting by serial number and person number (IPUMS) and HSEQ and PPPOS (CPS). I used the person weight to validate the merge and found that the match does not work correctly, starting with HSEQ=350 (record number 479).

Since the sum of the weights and the total number of records in the two files matches, the sort to assign serial numbers in the NBER/ IPUMS file must be on a variable other than HSEQ.

Given that I can’t merge the data, is there a data dictionary that can explain the code you provide for creating the HIUD variable by mapping the IPUMS variables to the code book for CPS? Thank you in advance.



In the question you reference, when I say that the two files share a sort order, I mean that when you first download the files, the records are in the same order. Sorting the records according to h_seq and pppos disrupts the original sort order, making the two files no longer congruent. When I sort the data as you suggest, I see many mismatched records. However, if I perform the sequential merge without sorting the records all match.

I hope this helps.