It seems to me that there are more observations in the raw basic monthly files compared to data provided by IPUMS. Is there a very simple explanation or am I making a mistake/misunderstand my data? Any help would be greatly appreciated.
There should be no difference in the number of observations in the public use CPS data from Census Bureau and CPS data from IPUMS. The Census files are our starting point and while we rename variables and update codes to facilitate comparability across time, we do not drop observations. If you share the specific year(s) of ASEC or other supplement data, or the month and years of basic monthly survey data where you are seeing discrepancies and how many observations you are seeing in each file I can provide more information about why you might be seeing a difference.
Thanks a lot for your quick response. For example if I look at August 2017 I see 127424 observations in the IPUMS data while the raw baisc month file has 147025 observations …
The difference you are describing is people in non-interviewed households. If you tabulate the frequencies of the original CPS variable HRINSTSTA in the raw basic monthly data from the Census Bureau for August 2017, you will see that the number of records who report an interview (HRINSTSTA == 1) is the same as the number of records in the IPUMS CPS file (N = 127,424). IPUMS data extracts are person-level by default. In order to view non-interviewed households (which have no persons) in your extract, you must choose to a hierarchical format.