Hi,
I want to download all surveys in which 3 variables are available (edattaind, incwage and labforce). I’m trying to find a way to exclude all missing values and surveys in which these variables are not available. When I click on “select cases”, it seems that it is not possible for continuous variables (I can select cases from labforce and edattaind but not from incwage). Therefore, the sample provided is way too big (over 45GB). What should I do?
Thank you.
You are correct that “select cases” is generally not available for continuous variables.
Regarding samples in which the variables are not available, the quickest way to determine this is by looking at the availability section for each variable. For example, here is the link for LABFORCE. You should also look at the universe section to determine which people (in particular, which ages) will have nonmissing data for each variable in each sample, though you’ll need to drop the missing cases after downloading your extract, since the criteria vary by sample.
In order to reduce your file size, the simplest thing for you to do in this case is to create several smaller extracts, each containing a smaller number of samples. You can then load these smaller extracts into a stats package, drop the cases based on INCWAGE, and resave the dataset. Then re-combine the various samples into one dataset. Alternatively and depending on your analysis needs, you can calculate summary measures of interest for each sample before combining the samples, without preserving all the microdata.