I am looking for clarification on the precise nature of “strata” in most of the IPUMS-International samples. From the page on comparability of this variable across samples, “strata” is described as:
“In most samples, the STRATA variable captures implicit geographic stratification and is created by assigning a unique identifier to groups of between 10 and 19 adjacent households within low level.”
My confusion is about what this means in the context of IPUMS being, for instance, a 5% sample of all households. Consider a hypothetical sample that contains 5% of all households. The strata variable has 10 households per strata. Does this mean:
A. The data include 5% of households in each strata. In this case, each strata in reality has 200 households, but data users see only 5% (10) of them, presumably selected randomly.
B. The data include all households in each strata, but only 5% of all strata. In this case, there are 10 households per strata, but there are 20 times more strata in reality than we see in the data.
I am confused because the description calls households in the same strata “adjacent,” which implies – to me – that a strata includes ten contiguous households, rather than 10 households selected out of a contiguous set of 200.
Does anyone have any insight into this question? It is probably not important for most uses, but it is critical for my purposes.