I am trying to write a few “first cut” functions to convert variables that ipumsr supplies as haven-style labeled vectors to pairs of vectors, as follows:
First, certain data entries denote values that are in some sense missing, but these missing values are of many types and the type-information is often important. I propose to preserve the type-information by saving it in a same-length vector that saves the form of missingness for missing values, and otherwise takes a standard value, probably a length-zero character vector. This will include incorporating quality flag information in some cases. The original vector will have its values replaced by NAs in these locations.
Second, once the missing values are removed, I want to do as best I can to distinguish:
- binary-valued variables (e.g. received child support);
- numerical-valued vectors (income components);
- unordered-factor-like vectors (race, geography); and
- ordered factor-like variables (age, – maybe income ranges rather than values for some of the older data?).
I’ll be looking at the meta-data files, and assuming that I will need to do some of this by hand. But if you are able to give me a leg up as to whether there are meta-data descriptors that militate for particular data types, I’d be grateful for whatever insight you might be able to share.
Then I have a bunch of questions about missing data descriptions that I will post as a separate question.
Warmest regards, Andrew Hoerner