How do you deal with svyset's STRATA when using Census decenial data and ACS data together?

I am working with the 1% IPUMS samples from 1970 to 2010, which include census data and ACS data. When I svyset the data for “better” variance estimaiton, since the STRATA variable represents two different things for the data prior to and after the year 2000 (different geographic regions and units), do I need to do something to account for this? Or does the PERWT variable account for this together with CLUSTER? I am asking because I have not been able to run a Multilevel Linear Model because it tells me data is not clustered correctly (using stata’s meglm command) and I have noticed that some strata get repeated in different data samples. Thanks in advance for helping me with this matter.

While the design variables, such as STRATA, are harmonized over time, they are not adjusted for pooling multiple years. In order to not underestimate your standard errors, it is recommended that you treat your pooled data as one large sample. Specifically, this means you should adjust your STRATA variable so that values are unique across samples (e.g. you could append YEAR to the STRATA value).

Hope this helps.


I am new to IPUMS USA. I am using a similar sample as the one in the initial post, namely 2000 Census + ACS 2001-2014. I did what was suggested, but it seems there is something wrong. For my analysis I have thrown out all observations with age<18 in order to have a more mageable dataset. Then if I do the following commands

gen firstgen = .

replace firstgen = 1 if bpld~=.&bpld>=10000

gen mystrata = string(strata)+string(year)

svyset cluster [pweight=perwt], strata(mystrata)

svy, subpop(if firstgen ==1&age>=25&agearrive<=5): mean age sex marst I get answerImage1730449-24123240-full.png Notice that Population size = 2,642,064,339! This would suggest the US has more people than China and India…what is going on? Can I trust the Subpop. size of 43,181,154? Any idea as to what is causing this? Thanks for the help.

Economix, the way I understand it, the population and subpopulation information reported by svy commands is the result of multiplying each observation in the pooled large sample by the sampling weights in PERWT and adding it all together. Since each of your ACS samples is weighted to represent the full population as if there were no repeated observations you are then adding the full population of 2000 to the full population 2001, to the full population in 2002, and so on…, which results in the astronomical number in your output. To get the subpopulation size for each year you could use the following line in Stata (don’t forget to avoid using “if” conditioning in survey declared data but rather use the subpop command in svy):

. svy, subpop (if age >=18): tab year, count format(%14.3gc);

I hope this helps, and I will let the experts at MPC correct me if I’m wrong.

It makes perfect sense to me. I had imagined something like that could be the case. Now, should one adjust the data some other way also? Originally I was planning to use xtreg +fe to do the analysis, but SVY in STATA does not accept xt commands. Thus, I was thinking I would use areg + [fweight=perwt] to do the analysis clustering at the FE level. Not sure if the SE’s will be correct in this case though.