Difference between SERIAL and CLUSTER for household-level clustering?

kw31 · September 30, 2014, 8:21pm

Greetings,

I’m currently using IPUMS data for some analysis, accounting for stratification and clustering in the samples to correctly estimate standard errors. I notice that the variable CLUSTER is “an integrated variable which uniquely identifies each household record in a given sample” (https://usa.ipums.org/usa/complex_sur…) and in turn is recommended for analysis. However, my understanding is that SERIAL also uniquely identifies each household in the sample. Are these variables functionally equivalent, or do they differ? Thanks for your guidance on this.

skolenik · September 30, 2014, 8:30pm

SERIAL is the true household ID. CLUSTER is an identifier of the cluster for variance estimation, and it clusters at a higher level, whichever is appropriate for the survey design (block group or something like that). So I don’t think that

isid CLUSTER

will work (in Stata lingo), but I imagine that

isid CLUSTER SERIAL

would work.

grover · October 1, 2014, 4:00pm

CLUSTER is theoretically equivalent to SERIAL, in that it does uniquely identify each household in a given sample, however IPUMS-USA recommends using CLUSTER in conjunction with STRATA as CLUSTER includes year and datanum components.

I hope this helps.

Topic		Replies	Views
Which variable do I use as the cluster variable for the India surveys? INTERNATIONAL	4	385	May 12, 2017
Clarity of SERIAL variable documentation CPS	1	277	June 26, 2020
Countyicps data	3	388	April 17, 2020
What's the difference between serial and CPSID? CPS	1	418	October 27, 2016
Does a serial number (SERIAL) correspond to a combination of the two household ID's in the regular census data? CPS	1	509	July 24, 2013

Difference between SERIAL and CLUSTER for household-level clustering?

Related topics