This is my first time using IPUMS API, I only use CENSUS API before

Hieu_Tran · February 13, 2025, 9:20pm

I’m migrating to IPUMS API from CENSUS API. I was trying to look all over the documentations, but it was quite confused to me. I wanted to extract vacancy data from Puerto Rico for 2010 and 2020 with geometry to plot it later on.

Please send help and thanks in advance!

JonathanSchroeder · February 14, 2025, 8:09pm

The IPUMS API will support your project. IPUMS has census data about housing vacancy in Puerto Rico, along with shapefiles representing the census areas in Puerto Rico, and you can access this data through the API.

IPUMS provides different types of data through several different projects. The project that provides geographic census data for the U.S. and Puerto Rico is IPUMS NHGIS. (You could also get census microdata for Puerto Rico from IPUMS USA, but the microdata have limited geographic detail.)

To learn more about the types of data in IPUMS NHGIS and the different ways to access it, I recommend viewing this webinar, or you could explore various other resources on the NHGIS website.

The IPUMS Developer Portal provides complete details about the IPUMS API, including instructions for getting started and example workflows and code for accessing NHGIS data and metadata. For Python and R users, the easiest way to interface with the API is to use one of our client SDK libraries: ipumspy for Python and ipumsr for R. For ipumsr, there’s a blog post and webinar that introduce how to access NHGIS in R. We haven’t yet added similar resources for ipumspy, but this page gives examples of ipumspy code accessing NHGIS.

If you’ve reviewed these materials and you still have questions about how to access the data you’re looking for, please feel free to reply here.

A couple notes about Puerto Rico data on NHGIS:

NHGIS includes no data for Puerto Rico in its tables for 2000 or earlier years. Puerto Rico is also not available in NHGIS GIS files from before 2000, but it is available in the 2000 GIS files that are based on 2010 TIGER/Line files.
Currently, most NHGIS data files cover all areas in the U.S., including Puerto Rico (for the years described in the previous note). We expect to release an update very soon that will enable NHGIS users to limit most table data requests to selected states or state equivalents.

Hieu_Tran · February 21, 2025, 8:44pm

I’m able to find a way to extract and read the data from API now. But there is one more thing I wanted to get clarify. I wanted to extract the data for vacancy variable for Puerto Rico in “nhgis” collection. I know from CENSUS vacancy variable is H3003, but I cannot extract only H3003 in IPUMSpy rather than extracting all the H3 datatable.

JonathanSchroeder · February 21, 2025, 9:32pm

That’s correct. NHGIS delivers whole tables only, unlike the Census API. It isn’t currently possible to request only a specific variable from NHGIS. We hope to add that option in a future update, but we have no timeline for that.

Note that one advantage of the NHGIS approach is that it does support obtaining multiple tables with a single request, whereas, if I recall correctly, the Census API delivers only one variable or one table (i.e., “group”) per request.

Hieu_Tran · February 27, 2025, 9:56pm

Hi Jonathan.

I’m stuck at specify the shapefile file’s name. Even though, I tried with the same name of the download shapefile from the Data Finder portal. I also tried with the naming convention but also no luck. Where can I find those file’s names for shapefile?

Thanks in advance.

Hieu_Tran · February 27, 2025, 10:31pm

I’m trying to get the tables H1, H3 for dataset 2010_SF1a but getting this error. BadIpumsApiRequest: Data tables invalid for dataset 2010_SF1a: H1, H3. Please advise!

JonathanSchroeder · February 27, 2025, 10:57pm

I assume that you’re still working in Python with the ipumspy package.

If so, please see the section in the ipumspy documentation on metadata for Aggregate Data Collections. There’s a “shapefiles” endpoint you can use to get metadata about NHGIS shapefiles, which should include valid names for API requests.

JonathanSchroeder · February 27, 2025, 11:00pm

If you’re listing multiple table names in Python, make sure to enclose each individual name in double quotes.

E.g., this code should work:

extract = AggregateDataExtract(
   collection="nhgis",
   description="An NHGIS example extract",
   datasets=[
      Dataset(name="2010_SF1a", data_tables=["H1", "H3"], geog_levels=["state"])
   ]
)

This code could cause the error that you got:

extract = AggregateDataExtract(
   collection="nhgis",
   description="An NHGIS example extract",
   datasets=[
      Dataset(name="2010_SF1a", data_tables=["H1, H3"], geog_levels=["state"])
   ]
)

MPC_vanriper · February 27, 2025, 11:15pm

Could you share the code blocks that are throwing the errors? That would help us diagnose the issues faster, and let others chime in! Just seeing the error messages makes it hard for us to assess what could be causing the issue.

Yours,
Dave Van Riper

Hieu_Tran · March 18, 2025, 3:37pm

Yes, my mistake was just like Johnathan described; I have a rising issue with the tiger boundary data with census data. For 2020, the census data table is not matching with the tiger files in number of rows.

is there anyways to retrieve the shapefile with tabular data merged?

Thanks in advance.

MPC_vanriper · March 18, 2025, 4:31pm

We don’t provide shapefiles merged with tabular data. We try to make the merge as easy as possible with our GISJOIN field, which is common between shapefiles and tabular data.

I need more details so that I can help answer your question about the mismatch in record counts. What geographic level (e.g., county, census tract, place) are you looking at? What are stored in “df_2020.shape” and “gdf.shape”?

Dave

Hieu_Tran · March 18, 2025, 4:41pm

This is my extraction code

They’re both in block units.

MPC_vanriper · March 18, 2025, 5:42pm

That helps a lot!

NHGIS erases the coastal water from shapefiles before publishing them. As part of that erasure, we remove entire census blocks from the shapefile. Those erased census blocks are retained in the data files, however, and that results in a mismatch between the number of records in the shapefile vs. the tabular data.

We make sure to check that the erased blocks contain zero persons and zero housing units before releasing the shapefiles. So, although the record counts differ, any tabular data record is for a census block with zero people and zero housing units.

Dave

Topic		Replies	Views
IPUMS API Use Capabilities IPUMS API	3	862	June 24, 2021
Block group level statistics for PRCS USA	2	246	May 26, 2023
Looking for share of Cuban and Puerto Rican population by US county from 1950 to 1990 USA	1	367	March 29, 2017
IPUMS zip code data USA	5	884	December 20, 2021
Our first public APIs are live! IPUMS API	2	848	May 24, 2019

This is my first time using IPUMS API, I only use CENSUS API before

Related topics