First, thank you for the IPUMS CPS samples. Have always found both the data and support from IPUMS to be excellent
We are attempting to merge IPUMS CPS data with the Computer and Internet Supplement from the NTIA in order to bring in additional (computer/internet) variables not included in the IPUMS sample. We are following the advice given in response to a previous post (below) to use the variables HRHHID, HRHHID2, and PULINENO (LINENO in IPUMS) as linking keys. We are doing this in Stata
We were surprised to notice that not all observations from the NTIA sample match the observations in the CPS monthly sample. We believe the reverse is also true. We initially found this to be the case for November 2017, November 2019, November 2021 and November 2023. But we also verified that observations do not match for a single year (2023) as well. Perhaps we are not doing the merge correctly? Or perhaps there is a good reason the number of observations should be different?
I suspect the non-merged records you see are non-interviewed household records. The NTIA files are person-level, meaning each row or observation is a person record. However, these files also include “empty” records for non-interviewed households. All of these records have PULINENO=-1 in the NTIA files. You can identify non-interviewed households using HRINTSTA in the NTIA files (a value other than 1 indicates the household was not interviewed). See the NTIA documentation on these data for more information.
IPUMS CPS extracts can be rectangular on person or hierarchical. In a rectangular person-level extract, each row or observation represents a person. IPUMS CPS does not include person records for non-interviewed households in this record format because they do not contain any person-level data. In a hierarchical extract, there are person records nested under household records. Non-interview households will be included in the file, but there will be no person records nested below them. I assume you merged the NTIA file with a rectangular IPUMS CPS extract.
I merged the 2023 Computer and Internet Use Supplement data from IPUMS CPS (using a rectangular on person extract) with the 2023 NTIA data. There were 99,634 successfully merged records, and there were 27,283 records that were in the NTIA file but not in the IPUMS CPS file. These non-merged records from the NTIA file all represent non-interviewed households (indicated by HRINTSTA!=1 in the NTIA files). There are exactly 27,283 non-interviewed household records in the IPUMS CPS hierarchical file for the November 2023 sample. The non-interviewed households in the IPUMS CPS file and the non-interviewed households in the NTIA file are the same households (you can see that they match on HRHHID and HRHHID2).
Most researchers don’t have a need to use the non-interviewed household records.
Your post helped me narrow down the mistake I made. IPUMS CPS HRHHID and HRHHID2 are stored as double while NTIA data stores them as strings. One has to be careful while destringing to avoid loss of precision. Once I corrected for that I was able to get exactly the same match statistics as you