Help with dataset metadata using IPUMS API

Aniket_Patil · September 11, 2024, 2:12pm

Hello,

I am working with datasets that contain information about housing values, and I’m encountering some challenges in identifying the appropriate geographic level for each dataset and matching them with the correct crosswalk. For example, if I use the 1990_STF1 dataset and select the “Occupancy” data table, I would like to know at which geographic level this data table is available.

I understand that I can select a dataset, choose a specific geographic level, and then request certain data tables using code like the following:

library(stringr)
ds_meta$data_tables |>
  filter(str_detect(description, "Tenure"))

nhgis_ext <- define_extract_nhgis(
  description = "2017 ACS Tenure by race and ethnicity",
  datasets = ds_spec(
    "2013_2017_ACS5a",
    data_tables = c("B25003B", "B25003D", "B25003H", "B25003I"),
    geog_levels = "county"
  )
)
nhgis_ext

However, I am looking for a more systematic way to achieve two things: Identify all data tables in a dataset that are related to housing. Determine the geographic levels at which these data tables are available, ideally identifying the lowest geographic level for each.

This information is crucial for selecting the correct crosswalk level for interpolation. For example, if a data table is available at the census block level, I want to ensure I use the appropriate crosswalk for that level.

Is there a way, using the IPUMS API and following best practices, to find out the lowest geographic level at which a specific data table (like the “Occupancy” data table in the 1990_STF1 dataset) is available?

Thank you for your help!

JonathanSchroeder · September 11, 2024, 7:45pm

First, to answer your specific question about 1990_STF1: All of the tables in that dataset are available at the block level.

Generally, NHGIS datasets are designed to provide a consistent set of geographic levels for all tables in the dataset. If a source Census Summary File provides tables that aren’t all available for the same set of levels, NHGIS splits the source dataset up into multiple NHGIS datasets, each containing a set of tables that are available for a consistent set of geographic levels. For example, most recently, when we added the 2020 Census Demographic and Housing Characteristics File (DHC), we split it up into 3 NHGIS datasets:

2020_DHCa: DHC - P & H Tables [Blocks & Larger Areas] – 141 tables
2020_DHCb: DHC - PCT & HCT Tables [Tracts & Larger Areas] – 98 tables
2020_DHCc: DHC - PCO Tables (Group Quarters Population Only) [Counties & Larger Areas] – 10 tables

With ipumsr and the IPUMS API, you can discover which levels are available for each dataset by using the get_metadata_nhgis function. There’s an example here in the NHGIS API Requests vignette for ipumsr. For 1990_STF1, the code would be:

stf90_meta <- get_metadata_nhgis(dataset = "1990_STF1")
stf90_meta$geog_levels

That same vignette provides other examples you may find useful for exploring NHGIS metadata. For example, in the Summary metadata section, there’s an example of how you might use dplyr utilities to filter a list of NHGIS dataset descriptions to find those pertaining to ‘Agriculture’. You could apply a similar strategy to filter the list of 1990_STF1 tables to find those with a description or universe containing the words ‘Occupied’ or ‘Occupancy’.

Aniket_Patil · September 22, 2024, 10:35pm

Apologies for my late response, but your response was very helpful, thank you so much!

Topic		Replies	Views
2018 ACS1 Metadata IPUMS API	5	506	July 15, 2020
NHGIS Time Series and Table-Level Metadata Now Available! IPUMS API	0	524	September 23, 2019
IPUMS zip code data USA	5	881	December 20, 2021
How to get ACS 1-year & 5-year data at different levels of geography? USA	5	539	October 30, 2019
NHGIS Shapefiles, Extent Selection, Year Selection, and Breakdown Selection Now Available! IPUMS API	0	529	October 30, 2019

Help with dataset metadata using IPUMS API

Related topics