Hi all, (Or Hi IPUMS Staff :D)
I’m stuck on something that I’m wondering if anyone has any conceptual and/or applied feedback on.
To note: I use R to analyze ACS extracts with the survey package. Given the structure of ACS data - using either PERWT or REPWTP - is it possible to take random selections of the population?
For purely illustrative purposes, let’s say I have an ACS extract of respondents with STATEICP and INCTOT. Instead of just aggregating the income by state (mean, median) etc, what if I wanted to take a random selection of people per state - and then show summary stats and/or do an analysis on them? This is purely illustrative - but I do need to find a way to take the sample provided by an ACS extract, and then randomly select X% of them for an analysis.
The trick here is how to randomly select a portion of your sample (as a subset) to run a new analysis on. The problem I can’t seem wrap my head around is that given the nature of the data - where 1 row may represent 3 or 80 people etc. - how can you take a random selection?
My intuition is to create a binary variable in the data object (before converting it to a survey object) and called “randomlySelected” and have some pre-defined probability it is 1 or 0. Then once I create a survey object, I can subset to only data where randomlySelected == 1. The problem here is though, let’s say I want to randomly select 50% of the sample, while the variable may reflect roughly 50% of the data object, once I convert it to a survey object it may not represent 50% of the sample.
Any insight would be super appreciated!