Hello - I’m interested in how economic conditions have changed over time from 2000 to 2015, so I’m using the geographic crosswalk files as specified with GISJOINER and the interpolation weights. The instructions are pretty straightforward when my variable of interest is a count (e.g., # of foreign born individuals in the zone of interest), but what about for medians? (e.g., median household income, medina home value). Can I still use the weights in the same way as I would for counts? Thank you!
Using the NHGIS crosswalks for medians is not as simple. (This is the main reason we haven’t yet extended NHGIS standardized time series to include medians.) E.g., if you applied the crosswalk weights to median household income in exactly the same way as with household counts, then if a target unit is estimated to contain 5% of a source unit’s households, you’d end up allocating 5% of the source median income to that target unit, say, $2,500 out of $50,000. But of course, that’s a bit nonsensical! It’d be better to assume those 5% of households have the same median income as the whole source unit, rather than 5% of the median. The crosswalks, however, don’t yet support making that kind of assumption!
The simplest alternatives right now are one of the following:
- Use means instead of medians, and apply crosswalks separately for each mean’s numerator and denominator. E.g., to compute per capita income, you could estimate “aggregate income” and “total population” separately using the crosswalk weights, and then divide one by the other.
- Start with a table of counts broken down by the value of interest, e.g., housing units by home value, and use the crosswalk to estimate each count in the table. Then estimate the median from the frequency distribution, for which there are various online guides.
Thanks so much, makes sense!
Following up on this conversation - to do the average approach, I assume that the weight for the numerator would be the same for the denominator? i.e., for per capita income, I would use the population weight for both aggregate income and total population? Thank you!
Yes, I’d use population weights for both. Of the available weights, I’d expect aggregate income to have a spatial distribution most similar to the total population’s (not to families’, households’ or housing units’), and total population of course matches best with population weights, too.
Hello Jonathan! Commenting on Step #2–
Start with a table of counts broken down by the value of interest, e.g., housing units by home value, and use the crosswalk to estimate each count in the table.
It was confirmed from the ACS team that I should use table B25063 for Gross Rent count values.
- Download tables B25063_003 through B25063_026 for years 2010-2020 at the block_group level to get a frequency table per year.
- Standardize every 2010-2019 value to their 2020 counterparts using the block_group crosswalk file and use the household weights (wt_hh)
- Once every table is standardized, estimate the media based on frequency distribution of the new standardized values.
Are those the steps and tables you would recommend?
Yes, that’s the right idea. A couple notes:
In the parlance of census summary data, B25063 is a single table, and B25063_003 through B25063_026 are variables within that table. Through IPUMS NHGIS you would select the B25063 table once (for each source ACS dataset) and that would get you all of the variables in the table… no need for requesting each variable 003 through 026 separately. In the Census API, I think tables are referred to as groups, but you could still request just a single group.
As suggested in my reply to your question about the limited selection of ACS year ranges in time series tables, it’s problematic to use 5-year ACS data to construct annual estimates. It sounds like you’re still planning to include all 5-year ranges in your analysis, which would include many largely overlapping samples. For most applications, I would recommend instead analyzing only non-overlapping 5-year periods if possible.
A related point: the 5-year estimates do not represent characteristics for any single year. E.g., the 2016-2020 5-year estimates represent characteristics throughout that 5-year period, not just 2020.
Got it thank you so much!
Hi @JonathanSchroeder! Your response was really helpful. Thank you! I’m following your #2 suggestion for estimating the 2006-2010 ACS median household income for 2020 geographies.
- Start with a table of counts broken down by the value of interest, e.g., housing units by home value, and use the crosswalk to estimate each count in the table. Then estimate the median from the frequency distribution, for which there are various online guides .
I was able to apply the household weights to crosswalk the hh counts for each income bin to 2020 geographies and was using the formula from this page to estimate median hh income. This formula requires the width of the median group to estimate the median.
As you can guess, there are geographies (not many, but do exist) where the median falls in the highest bin (i.e., $200K or above), of which the Census does not specify the upper bound. Do you have any advice on how to deal with these cases?
In tables where the Census Bureau reports medians, they run into a similar problem. Their approach is to limit the maximum reported median to a specific value and report no medians above that. You can find the list of maximum values on page 18 (PDF page 24) of the ACS Summary File Handbook (which NHGIS provides through its Tabular Data Sources page). Depending on the table, they may use either 200,001 or 250,001 for the highest reported median income.
My suggestion is to mimic their approach. If the median falls in the highest bin, then I would assign a median equal to the lowest value for that bin. I don’t think there’s a practical way to do better, and that should suffice for mapping and many other applications. I hope that’s adequate for your purposes!
I think it will be. Super helpful! Thank you so much for your help.