I have looked at the large place PUMA 2010 crosswalk for Dallas (Dallas city). Do you have any recommended procedures on which PUMS to use? Many include suburban areas. I realize this is a complicated question but I am relatively new to this site and I may not be finding exiting guidance.
In the place where you found the crosswalk between large places and PUMAs (possibly the bottom of the CITY comparability page), there’s another file called the “PUMA Match Summary by Large Place”. That file includes a column called “Best-Matching PUMAs,” which lists the codes of all PUMAs where a majority of the PUMA population resides within the corresponding city. For Dallas, that list includes 10 PUMAs: “02304, 02305, 02306, 02307, 02311, 02312, 02313, 02314, 02315, 02316”.
Another column indicates the omission error (how much of the city’s population is not in this set of PUMAs) and another indicates the commission error (how much of the population in this set of PUMAs is not in the city). For Dallas, unfortunately, both the omission error (17.8%) and the commission error (11.3%) are substantial. You might nevertheless choose to simply associate all of the microdata records in these PUMAs with the city of Dallas, with a summed mismatch error of 29%.
Alternatively, you could use information from the crosswalk to apply a PUMA-based weight to each microdata record, equaling the portion of each PUMA’s population that resides in Dallas. E.g., for Texas PUMA 02310, 33.31% of the population lived in Dallas (in 2010). You could assign a 0.3331 weight to all microdata records in that PUMA, representing their likelihood of actually being in Dallas. (This would be in addition to the person or household weight.) There are 5 PUMAs that lie entirely within Dallas, and for those this custom “Dallas weight” would be 1.0, etc. In this way, you’d produce a better model of the actual Dallas population than if you assigned whole PUMAs to (or not to) Dallas.
NOTE: To uniquely identify a PUMA, you also need to include a state code (e.g., 48 for Texas).
Thank you very much for the information and suggestions. I will have to mull over the two main options. I looked at the table that you reference and that helps. It does sound like the weighting option would be more accurate.