I downloaded 1980 census tract data for Suffolk County, Massachusetts which gives me 183 records. I am trying to match that data to HMDA data (i.e., mortgage application data) which lists tract ids for the same time period for each loan. However, the HMDA data shows 265 unique tract ids for Suffolk County, MA for the same time period. All of the IPUMS tract records have a matching tract id in the HMDA data, but there are 82 tract ids in the HMDA data that don’t match up with the IPUMS data.
Does IPUMS NHGIS census data for 1980 use the actual tract ids from that time period? Or is the problem with the HMDA data?
Dear Marcos,
We used the digital demographic data from the 1980 decennial census when creating NHGIS, so we do have the original census tract IDs. We built our mapping files from a combination of TIGER/Line 1992 data and scanned maps that the Bureau published for the 1980 census.
I am not familiar with the 1980s-era HMDA data. Do you have some examples of the census tract IDs that do not match the 1980 census data? If you can provide state, county, tract, and MSA codes from the HMDA data that do no match, we can try to see what’s causing the mismatch.
Dave Van Riper
Here are the unmatched HMDA records. The HMDAcensus_tract field is the original tract IDs and the HMDAcensus_tractB are with the period stripped out for joining with the IPUMS records. Sorry, can’t upload a proper CSV as a new user.
HMDAcensus_tractB HMDAstate_code HMDAcounty_code HMDAcensus_tract ipumsGISJOIN
1 000000 25 025 0000.00 NA
2 000001 25 025 0000.01 NA
3 000003 25 025 0000.03 NA
4 000006 25 025 0000.06 NA
5 000007 25 025 0000.07 NA
6 000009 25 025 0000.09 NA
7 000010 25 025 0000.10 NA
8 000017 25 025 0000.17 NA
9 000115 25 025 0001.15 NA
10 000200 25 025 0002.00 NA
11 000401 25 025 0004.01 NA
12 000600 25 025 0006.00 NA
13 000640 25 025 0006.40 NA
14 000700 25 025 0007.00 NA
15 001700 25 025 0017.00 NA
16 002000 25 025 0020.00 NA
17 003372 25 025 0033.72 NA
18 006553 25 025 0065.53 NA
19 011501 25 025 0115.01 NA
20 013011 25 025 0130.11 NA
1 014801 25 025 0148.01 NA
2 015100 25 025 0151.00 NA
3 019000 25 025 0190.00 NA
4 020500 25 025 0205.00 NA
5 021400 25 025 0214.00 NA
6 050000 25 025 0500.00 NA
7 068500 25 025 0685.00 NA
8 071300 25 025 0713.00 NA
9 075300 25 025 0753.00 NA
10 090201 25 025 0902.01 NA
11 090991 25 025 0909.91 NA
12 100401 25 025 1004.01 NA
13 100600 25 025 1006.00 NA
14 101000 25 025 1010.00 NA
15 101100 25 025 1011.00 NA
16 101120 25 025 1011.20 NA
17 110100 25 025 1101.00 NA
18 110201 25 025 1102.01 NA
19 110400 25 025 1104.00 NA
20 110500 25 025 1105.00 NA
1 110600 25 025 1106.00 NA
2 112000 25 025 1120.00 NA
3 115010 25 025 1150.10 NA
4 120501 25 025 1205.01 NA
5 130101 25 025 1301.01 NA
6 130110 25 025 1301.10 NA
7 130400 25 025 1304.00 NA
8 140100 25 025 1401.00 NA
9 140401 25 025 1404.01 NA
10 150000 25 025 1500.00 NA
11 150101 25 025 1501.01 NA
12 150200 25 025 1502.00 NA
13 150901 25 025 1509.01 NA
14 160211 25 025 1602.11 NA
15 160290 25 025 1602.90 NA
16 170120 25 025 1701.20 NA
17 176100 25 025 1761.00 NA
18 180105 25 025 1801.05 NA
19 210100 25 025 2101.00 NA
20 250100 25 025 2501.00 NA
1 250286 25 025 2502.86 NA
2 301100 25 025 3011.00 NA
3 303000 25 025 3030.00 NA
4 313700 25 025 3137.00 NA
5 337300 25 025 3373.00 NA
6 339100 25 025 3391.00 NA
7 340100 25 025 3401.00 NA
8 341119 25 025 3411.19 NA
9 341400 25 025 3414.00 NA
10 342200 25 025 3422.00 NA
11 350115 25 025 3501.15 NA
12 352400 25 025 3524.00 NA
13 353400 25 025 3534.00 NA
14 358300 25 025 3583.00 NA
15 368191 25 025 3681.91 NA
16 370140 25 025 3701.40 NA
17 373148 25 025 3731.48 NA
18 403100 25 025 4031.00 NA
19 408000 25 025 4080.00 NA
20 422500 25 025 4225.00 NA
1 703000 25 025 7030.00 NA
2 710600 25 025 7106.00 NA
3 811100 25 025 8111.00 NA
4 914000 25 025 9140.00 NA
5 999900 25 025 9999.00 NA
6 999999 25 025 9999.99 NA
Here is a link to the csv
Thanks for sharing that information, Marco! I did a quick pass through the list, comparing it to the Census data. I’m not expert on HMDA data, but I found a few things that may be helpful to you.
1-to-many relationships
For a subset of the tract IDs in the HMDA data, there are two records in the Census data with different suffixes (the values after the period). For example, HMDA value of:
10 000200 25 025 0002.00 NA
Does not match to the census data. In the census data, there are records for 0002.01 and 0002.02. The Census Bureau delineated two census tracts (2.01 and 2.02) but the person capturing census tracts in HMDA only typed in 2.00.
In cases like this, you could aggregate the census data together to create a record for tract 2.00, which would then match the HMDA value.
HMDA county ID is wrong
The Boston SMSA contains multiple counties, and the Census Bureau uses a different coding scheme for each county’s tracts. In 1980, Middlesex County’s census tracts start with “3301” and go up to “3881”. Looking at some of the unmatched HMDA codes (e.g., 3524.00, 3414.00, 3422.00), I think they should have been assigned the code for Middlesex County and not the Suffolk County code. The final tract ID in Suffolk County in the 1980 census was 1805, so any tract with an ID larger than 1805 is probably in a different county.
Incorrect values after the period
Based on the HMDA tract IDs, it looks as if some of the suffixes (the values after the period) may have been entered incorrectly. For example. 0006.40 in HMDA has no close value in census (6.01 or 6.02). As far as I can tell, there is no easy solution for these mismatches.
Dave Van Riper
Thank you so much! That is really helpful.