How do you deal with svyset's STRATA when using Census decenial data and ACS data together?

cabecerra · February 4, 2016, 5:31am

I am working with the 1% IPUMS samples from 1970 to 2010, which include census data and ACS data. When I svyset the data for “better” variance estimaiton, since the STRATA variable represents two different things for the data prior to and after the year 2000 (different geographic regions and units), do I need to do something to account for this? Or does the PERWT variable account for this together with CLUSTER? I am asking because I have not been able to run a Multilevel Linear Model because it tells me data is not clustered correctly (using stata’s meglm command) and I have noticed that some strata get repeated in different data samples. Thanks in advance for helping me with this matter.

Tim_Moreland · February 12, 2016, 7:39pm

While the design variables, such as STRATA, are harmonized over time, they are not adjusted for pooling multiple years. In order to not underestimate your standard errors, it is recommended that you treat your pooled data as one large sample. Specifically, this means you should adjust your STRATA variable so that values are unique across samples (e.g. you could append YEAR to the STRATA value).

Hope this helps.

Economix · February 24, 2016, 5:26am

Hi,

I am new to IPUMS USA. I am using a similar sample as the one in the initial post, namely 2000 Census + ACS 2001-2014. I did what was suggested, but it seems there is something wrong. For my analysis I have thrown out all observations with age<18 in order to have a more mageable dataset. Then if I do the following commands

gen firstgen = .

replace firstgen = 1 if bpld~=.&bpld>=10000

gen mystrata = string(strata)+string(year)

svyset cluster [pweight=perwt], strata(mystrata)

svy, subpop(if firstgen ==1&age>=25&agearrive<=5): mean age sex marst I get Notice that Population size = 2,642,064,339! This would suggest the US has more people than China and India…what is going on? Can I trust the Subpop. size of 43,181,154? Any idea as to what is causing this? Thanks for the help.

cabecerra · February 24, 2016, 6:21am

Economix, the way I understand it, the population and subpopulation information reported by svy commands is the result of multiplying each observation in the pooled large sample by the sampling weights in PERWT and adding it all together. Since each of your ACS samples is weighted to represent the full population as if there were no repeated observations you are then adding the full population of 2000 to the full population 2001, to the full population in 2002, and so on…, which results in the astronomical number in your output. To get the subpopulation size for each year you could use the following line in Stata (don’t forget to avoid using “if” conditioning in survey declared data but rather use the subpop command in svy):

. svy, subpop (if age >=18): tab year, count format(%14.3gc);

I hope this helps, and I will let the experts at MPC correct me if I’m wrong.

Economix · February 24, 2016, 4:47pm

It makes perfect sense to me. I had imagined something like that could be the case. Now, should one adjust the data some other way also? Originally I was planning to use xtreg +fe to do the analysis, but SVY in STATA does not accept xt commands. Thus, I was thinking I would use areg + [fweight=perwt] to do the analysis clustering at the FE level. Not sure if the SE’s will be correct in this case though.

Topic		Replies	Views
Question about svyset and IPUMS GLOBAL HEALTH	1	331	September 9, 2022
STRATA Variable in 2022 IPUMS USA USA	3	130	March 13, 2024
Variance estimation using CLUSTER and STRATA USA	2	268	November 22, 2022
Subpopulation variance: How to do it correctly? HEALTH SURVEYS	1	16	July 8, 2025
Adjusting for sampling weights USA	1	331	April 4, 2022

How do you deal with svyset's STRATA when using Census decenial data and ACS data together?

Related topics