Hi folks. Documentation for COSTELEC is quite thorough, but doesn’t seem to include any information about how ACS ELEP numbers (“LAST MONTH, what was the cost of electricity for this house, apartment, or mobile home?”) are annualized to COSTELEC. Same question for the annualization of GASP to COSTGAS.
For context, we’ve been able to match COSTWATR and COSTFUEL household-weighted means precisely to Census aggregate tables for WATP and FULP, respectively, but the corresponding annualized values for COSTELEC and COSTGAS do not bear any resemblance to ELEP * 12 or GASP * 12, for example.
Extra credit: we’re also looking for any documentation of whether and how the Census might adjust “last month” values of ELEP and GASP pre-release to data.census.gov and to IPUMS. Is there any attempt to wash out seasonality or within-year inflation prior to release? If so, what’s the method?
If no one here knows or cares, rest assured that we will doggedly pursue this until we can answer. I’ll be back!
COSTELEC is calculated by multiplying the monthly values reported in the ACS Public Use Microdata Sample (PUMS) by 12 with two exceptions: (1) variable-specific specialty/missing codes 9993 (“No charge or no electricity used”) and 9997 (“Electricity included in rent or in condo fee”) where no monthly electricity costs are provided, and (2) top coding.
To preserve respondent confidentiality, the Census Bureau edits the top 0.5% of reported values in each state/year pair and reallocates them to the median value in this top-coded group. For example, the 2023 original public use microdata sample from the Census Bureau sets the threshold for top coding of monthly electricity costs in Alaska at $1,000 and then assigns all households exceeding this cost a value of $2,400 (the group’s median). However, the maximum top code for IPUMS COSTELEC is currently fixed at $9,990; meaning that any state/sample-specific top code for monthly electricity expenditures that exceeds 9,990 when multiplied by 12 will be recoded to 9,990 by COSTELEC despite this being lower than the top-code reported in the original data. I have a message out to my colleagues to have this issue addressed. However, this only affects roughly the top 1% of values in the 2023 ACS. Note that COSTGAS also has the same issue of a fixed top code being enforced by the IPUMS recoding protocol.
It’s possible to run a direct comparison with published monthly figures using the unedited PUMS monthly value that we provide in the source variable US2023A_ELEP. I’m unsure which Census aggregate tables you are comparing your estimates to, but I was able to find this one that provides estimates of the number of households across six brackets of monthly electricity expenditures. Below, I’ve shared my estimates using IPUMS data for each of these brackets (weighting my analysis with HHWT and restricting to a single observation per household using PERNUM == 1). I drop households with the variable-specific codes 9993 and 9997. These estimates are overall very close to the figures reported in the table (~2% off) and are identical whether I estimate using the monthly source variable or COSTELEC/12 (because the highest interval of the monthly expenditures is 250 or more, and 250 is below the IPUMS-imposed top code of 9990, in this specific case the two are equivalent). While these estimates are not within the published margin of error, they certainly bear much resemblance. I suspect that any remaining difference is due to the fact that the table is produced using the internal restricted data file which includes the entire ACS sample (the PUMS includes about 2/3 of the entire sample) as well as unrounded values (the PUMS rounds values to the nearest $10).
Less than $50: 6,081,040
$50 to $99: 24,265,416
$100 to $149: 27,716,062
$150 to $199: 21,306,611
$200 to $249: 16,104,809
$250 or more: 29,737,541
Aside from the documentation in the editing procedure tab for COSTELEC, I am not aware of any other adjustments such as those for seasonality. The Census Bureau provides a within-year inflation factor that researchers may use, but the decision to use this factor is left to the discretion of each researcher.