Hello,
I am using VALUEH for ACS 2009,2010,2011 microdata. The variable’s description, codes, and questionaire text seems to be missing some context. For example, the questionaire page shows the 2006 and then 2011 questionaire text. There are other references in comparability to 2008 being a change in the construction.
My reading is that
- pre-2011, the question was categorical so all values for 2009 and 2010 should be mid-points of the 2006 categories.
- post-2011, the question became free response for a value, so 2011 should have free response values
However, for all three years there are a mix of free-response and categories.
Can you confirm that the variable for 2009-2011 is a mix of free response and categories?
Thank you,
Luke
An excerpt from tab valueh if year==2009
448000 | 26 0.00 64.66
449000 | 88 0.00 64.66
$400,000 - 499,999 | 25,667 1.14 65.80
451000 | 86 0.00 65.80
452000 | 27 0.00 65.81
453000 | 39 0.00 65.81
An excerpt from tab valueh if year==2011
448000 | 31 0.00 64.18
449000 | 60 0.00 64.18
$400,000 - 499,999 | 22,559 0.99 65.17
451000 | 62 0.00 65.17
452000 | 17 0.00 65.17
453000 | 21 0.00 65.18
I can understand your confusion. It sounds like your core question is about the code labels in the VALUEH variable; I will respond to that and then share some additional information that may be useful. Providing informative and intuitive labels for (largely) continuous variables when harmonizing so many years of data is often more challenging than harmonizing the underlying codes.
The Codes tab of the VALUEH variable notes that “for 1930, 1940, and 2008 onward samples, VALUEH is a continuous variable. Other years report the midpoint of an interval.” IPUMS applies labels to the mid-point of the intervalled responses of VALUEH in these “other years” (i.e., 1850-1920, 1950-2007). For example, code 450000 for VALUEH is labeled as $400,000-$499,999. The label describes the interval captured by a code of 450000 in these “other” samples. In 1930-1940 and 2008-forward, the code of 450000 simply means $450,000. It’s unfortunately not possible for us to assign different labels to separate samples. However, this should not affect your results since the codes are still correct; the labels are simply misleading in years that have continuous values. In years with continuous responses for VALUEH, you would expect to see codes of 449000 and 451000, but these codes would not be present in the years that report VALUEH responses in larger intervals. You can see these codes in your screenshots of the 2009 and 2011 have observations as you would expect based on the documentation indicating these are continuous in 2008-forward. When I look at the 2007 data, there are no cases in these adjacent codes and the code of 450000 has approximately four times the number of observations in 2007 as it does in 2008 (when the adjacent codes have observations). You may also choose to drop the labels entirely, particularly if you are only working with 2008-forward data (though note the presence of missing and not-in-universe category codes listed on each variable page)
Additionally, you may be interested in using the source variables for VALUEH. Source variables provide an unharmonized version of the underlying data that IPUMS integrates into our harmonized variables. Finally, you noted that the questionnaire text display jumped from 2006 to 2011. By default, our website only displays the questionnaire text for the samples that you have selected to include in your custom data file. The default selections for IPUMS USA are one sample per decennial census, the most recent one-year samples of the ACS (3-5 years typically), and every fifth ACS sample before that. Currently, the default sample selections for the ACS samples are 2006, 2011, 2016, and 2021-forward. Note that the display of some variable-level metadata will be restricted based on which samples you have selected (e.g., Codes, Questionnaire Text, and Source Variables).
I hope this helps. Please follow up with any questions.