This is my first time using IPUMS API, I only use CENSUS API before

I’m migrating to IPUMS API from CENSUS API. I was trying to look all over the documentations, but it was quite confused to me. I wanted to extract vacancy data from Puerto Rico for 2010 and 2020 with geometry to plot it later on.

Please send help and thanks in advance!

The IPUMS API will support your project. IPUMS has census data about housing vacancy in Puerto Rico, along with shapefiles representing the census areas in Puerto Rico, and you can access this data through the API.

IPUMS provides different types of data through several different projects. The project that provides geographic census data for the U.S. and Puerto Rico is IPUMS NHGIS. (You could also get census microdata for Puerto Rico from IPUMS USA, but the microdata have limited geographic detail.)

To learn more about the types of data in IPUMS NHGIS and the different ways to access it, I recommend viewing this webinar, or you could explore various other resources on the NHGIS website.

The IPUMS Developer Portal provides complete details about the IPUMS API, including instructions for getting started and example workflows and code for accessing NHGIS data and metadata. For Python and R users, the easiest way to interface with the API is to use one of our client SDK libraries: ipumspy for Python and ipumsr for R. For ipumsr, there’s a blog post and webinar that introduce how to access NHGIS in R. We haven’t yet added similar resources for ipumspy, but this page gives examples of ipumspy code accessing NHGIS.

If you’ve reviewed these materials and you still have questions about how to access the data you’re looking for, please feel free to reply here.

A couple notes about Puerto Rico data on NHGIS:

  • NHGIS includes no data for Puerto Rico in its tables for 2000 or earlier years. Puerto Rico is also not available in NHGIS GIS files from before 2000, but it is available in the 2000 GIS files that are based on 2010 TIGER/Line files.
  • Currently, most NHGIS data files cover all areas in the U.S., including Puerto Rico (for the years described in the previous note). We expect to release an update very soon that will enable NHGIS users to limit most table data requests to selected states or state equivalents.
1 Like

I’m able to find a way to extract and read the data from API now. But there is one more thing I wanted to get clarify. I wanted to extract the data for vacancy variable for Puerto Rico in “nhgis” collection. I know from CENSUS vacancy variable is H3003, but I cannot extract only H3003 in IPUMSpy rather than extracting all the H3 datatable.

That’s correct. NHGIS delivers whole tables only, unlike the Census API. It isn’t currently possible to request only a specific variable from NHGIS. We hope to add that option in a future update, but we have no timeline for that.

Note that one advantage of the NHGIS approach is that it does support obtaining multiple tables with a single request, whereas, if I recall correctly, the Census API delivers only one variable or one table (i.e., “group”) per request.

1 Like

Hi Jonathan.

I’m stuck at specify the shapefile file’s name. Even though, I tried with the same name of the download shapefile from the Data Finder portal. I also tried with the naming convention but also no luck. Where can I find those file’s names for shapefile?

Thanks in advance.

I’m trying to get the tables H1, H3 for dataset 2010_SF1a but getting this error. BadIpumsApiRequest: Data tables invalid for dataset 2010_SF1a: H1, H3. Please advise!

I assume that you’re still working in Python with the ipumspy package.

If so, please see the section in the ipumspy documentation on metadata for Aggregate Data Collections. There’s a “shapefiles” endpoint you can use to get metadata about NHGIS shapefiles, which should include valid names for API requests.

If you’re listing multiple table names in Python, make sure to enclose each individual name in double quotes.

E.g., this code should work:

extract = AggregateDataExtract(
   collection="nhgis",
   description="An NHGIS example extract",
   datasets=[
      Dataset(name="2010_SF1a", data_tables=["H1", "H3"], geog_levels=["state"])
   ]
)

This code could cause the error that you got:

extract = AggregateDataExtract(
   collection="nhgis",
   description="An NHGIS example extract",
   datasets=[
      Dataset(name="2010_SF1a", data_tables=["H1, H3"], geog_levels=["state"])
   ]
)

Could you share the code blocks that are throwing the errors? That would help us diagnose the issues faster, and let others chime in! Just seeing the error messages makes it hard for us to assess what could be causing the issue.

Yours,
Dave Van Riper

Yes, my mistake was just like Johnathan described; I have a rising issue with the tiger boundary data with census data. For 2020, the census data table is not matching with the tiger files in number of rows.

is there anyways to retrieve the shapefile with tabular data merged?

Thanks in advance.

We don’t provide shapefiles merged with tabular data. We try to make the merge as easy as possible with our GISJOIN field, which is common between shapefiles and tabular data.

I need more details so that I can help answer your question about the mismatch in record counts. What geographic level (e.g., county, census tract, place) are you looking at? What are stored in “df_2020.shape” and “gdf.shape”?

Dave

This is my extraction code

They’re both in block units.

That helps a lot!

NHGIS erases the coastal water from shapefiles before publishing them. As part of that erasure, we remove entire census blocks from the shapefile. Those erased census blocks are retained in the data files, however, and that results in a mismatch between the number of records in the shapefile vs. the tabular data.

We make sure to check that the erased blocks contain zero persons and zero housing units before releasing the shapefiles. So, although the record counts differ, any tabular data record is for a census block with zero people and zero housing units.

Dave