I am doing a regression that includes the race variable. It is an indicator if the individual is white or not. The issue is when I find the summary statistics the code for it seems to be switched as it does not match that of the source code. Is there anything I can do to fix this?
You are correct that the coding for RACWHT and its 2022 source variable US2022A_RACWHT differ. Respondents who are White alone or in combination are assigned a value of 1 in the source variable (and 0 otherwise). RACWHT harmonizes the source variable by assigning respondents who are White alone or in combination a value of 2 with the label of “Yes” (and otherwise a value of 1 with a label of “No”). You can see this in the Codes tab for each variable.
This is common across many other harmonized binary variables from IPUMS USA. Harmonized variables are available for users who want data that is comparable across ACS and decennial census samples, while source variables are available for those who want to view the original data and apply their own coding structure. If you’d like to reallocate RACWHT so that it follows this source variable coding pattern, you can replace RACWHT values of 1 (“No”) with 0 and values of 2 ("Yes) with 1.