Hello,
I am working with datasets that contain information about housing values, and I’m encountering some challenges in identifying the appropriate geographic level for each dataset and matching them with the correct crosswalk. For example, if I use the 1990_STF1
dataset and select the “Occupancy” data table, I would like to know at which geographic level this data table is available.
I understand that I can select a dataset, choose a specific geographic level, and then request certain data tables using code like the following:
library(stringr)
ds_meta$data_tables |>
filter(str_detect(description, "Tenure"))
nhgis_ext <- define_extract_nhgis(
description = "2017 ACS Tenure by race and ethnicity",
datasets = ds_spec(
"2013_2017_ACS5a",
data_tables = c("B25003B", "B25003D", "B25003H", "B25003I"),
geog_levels = "county"
)
)
nhgis_ext
However, I am looking for a more systematic way to achieve two things: Identify all data tables in a dataset that are related to housing. Determine the geographic levels at which these data tables are available, ideally identifying the lowest geographic level for each.
This information is crucial for selecting the correct crosswalk level for interpolation. For example, if a data table is available at the census block level, I want to ensure I use the appropriate crosswalk for that level.
Is there a way, using the IPUMS API and following best practices, to find out the lowest geographic level at which a specific data table (like the “Occupancy” data table in the 1990_STF1 dataset) is available?
Thank you for your help!
First, to answer your specific question about 1990_STF1
: All of the tables in that dataset are available at the block level.
Generally, NHGIS datasets are designed to provide a consistent set of geographic levels for all tables in the dataset. If a source Census Summary File provides tables that aren’t all available for the same set of levels, NHGIS splits the source dataset up into multiple NHGIS datasets, each containing a set of tables that are available for a consistent set of geographic levels. For example, most recently, when we added the 2020 Census Demographic and Housing Characteristics File (DHC), we split it up into 3 NHGIS datasets:
- 2020_DHCa: DHC - P & H Tables [Blocks & Larger Areas] – 141 tables
- 2020_DHCb: DHC - PCT & HCT Tables [Tracts & Larger Areas] – 98 tables
- 2020_DHCc: DHC - PCO Tables (Group Quarters Population Only) [Counties & Larger Areas] – 10 tables
With ipumsr and the IPUMS API, you can discover which levels are available for each dataset by using the get_metadata_nhgis
function. There’s an example here in the NHGIS API Requests vignette for ipumsr. For 1990_STF1
, the code would be:
stf90_meta <- get_metadata_nhgis(dataset = "1990_STF1")
stf90_meta$geog_levels
That same vignette provides other examples you may find useful for exploring NHGIS metadata. For example, in the Summary metadata section, there’s an example of how you might use dplyr
utilities to filter a list of NHGIS dataset descriptions to find those pertaining to ‘Agriculture’. You could apply a similar strategy to filter the list of 1990_STF1 tables to find those with a description or universe containing the words ‘Occupied’ or ‘Occupancy’.
1 Like
Apologies for my late response, but your response was very helpful, thank you so much!