Dear Team IPUMS,
I am trying to set up the survey design for Argentina 1980 to estimate a household-level variable like this (in R):
svydesign(strata=~STRATA, ids=~1, weights=~HHWT, data=data)
However, I am having this error: Error in h(simpleError(msg, call)) :
error in evaluating the argument ‘x’ in selecting a method for function ‘rowSums’: attempt to make a table with >= 2^31 elements
Am I using the STRATA variable correctly?
Any help on this would be extremely appreciated.
Thank you very much!
The 1980 Argentina sample includes 672,062 households across 66,972 unique strata. It appears that the svydesign() function in R cannot handle this many strata at the same time. I was able to include STRATA in Stata’s svy command without any issues, so this appears to be an issue specific to R’s survey design packages. I’ve shared this issue with my colleagues on the IPUMS International team who will review whether STRATA can be constructed to be more user friendly.
While IPUMS International recommends using STRATA to produce more precise standard error estimates, for the great majority of studies there is little risk of drawing invalid inferences because of underestimated variance. Since the Argentina 1980 sample is systematic, STRATA captures implicit geographic stratification rather than an explicit feature of the sample design. You could therefore substitute a less granular geographic identifier such as GEO2_AR1980, which identifies 346 departments, in place of STRATA . Alternatively, you could drop stratification from the survey design entirely. Your standard errors will be slightly larger without these survey design parameters. This will matter more for studies of weak relationships or small population subgroups, but should not be too significant for studying large groups.
Additionally, note that for household-level analyses using HHWT, you will want to first filter your dataset to persons with PERNUM =1 so that each household is only counted once.