Difference between SERIAL and CLUSTER for household-level clustering?


I’m currently using IPUMS data for some analysis, accounting for stratification and clustering in the samples to correctly estimate standard errors. I notice that the variable CLUSTER is “an integrated variable which uniquely identifies each household record in a given sample” (https://usa.ipums.org/usa/complex_sur…) and in turn is recommended for analysis. However, my understanding is that SERIAL also uniquely identifies each household in the sample. Are these variables functionally equivalent, or do they differ? Thanks for your guidance on this.

SERIAL is the true household ID. CLUSTER is an identifier of the cluster for variance estimation, and it clusters at a higher level, whichever is appropriate for the survey design (block group or something like that). So I don’t think that


will work (in Stata lingo), but I imagine that


would work.

CLUSTER is theoretically equivalent to SERIAL, in that it does uniquely identify each household in a given sample, however IPUMS-USA recommends using CLUSTER in conjunction with STRATA as CLUSTER includes year and datanum components.

I hope this helps.