I have a dataset from HUD that uses FIPS Code (2010) for census tracts. This code is typically 10-11 digits long. The SES data from NHGIS provides GISJOIN and I am aware of the relation between the two.
My question is how to convert the FIPS Codes into GISJOIN codes so that later on two datasets can be joined.
First, you say you’re aware of the relationship between FIPS and GISJOIN, but to be sure, here’s some more background…
A FIPS-based tract ID should have exactly 11 digits concatenating a state code (2 digits), county code (3 digits) and tract code (6 digits). If your dataset contains codes that are 10 digits long, that probably means that your software (or another source) has converted the text codes to numeric codes, eliminating all leading zeros, which is a problem in states where the FIPS code begins with a zero. E.g., California has FIPS code 06, but many software applications convert that 2-character text string to a number 6, eliminating the ability to join it directly to proper FIPS codes.
NHGIS GISJOINs prevent this misinterpretation by always beginning with a text character ‘G’. NHGIS also appends an extra digit to both the state and county codes so that we can uniquely identify historical entities that didn’t exist when FIPS codes were first created. For all current states and counties, the NHGIS codes add a ‘0’ to the FIPS codes, so the California code becomes ‘060’.
The best way to convert from one to the other depends on what software you’re using. Most data processing software includes a concatenation function (e.g., Excel’s “CONCAT”) and string parsing functions (e.g., Excel’s “RIGHT” and “LEFT”). You should be able to apply these functions to one code to make it agree with the other code.
Thanks Jonathan,
Unfortunately, the HUD database on LIHTC is prone to the issue of some omitting the first 0 for a number of states and that is why it is a bit messy to work with. What you’ve demonstrated with California is the rightly identified issue with the dataset in question and hence Excel is not allowing it to be padded it 0 for states with single-digit code. Furthermore, the issue remains constant when the county codes are considered.
I will try to look for a solution with contactenation in R where padding is possible.
Thanks for your help!
In Excel, if you have a number 6 in cell A1, you should be able to convert it to a text string ‘06’ with this equation:
=RIGHT(CONCAT(“0”, A1), 2)
For a full tract code, you’d change the last number there from 2 to 11 (the proper length of the text code):
=RIGHT(CONCAT(“0”, A1), 11)
Thanks @JonathanSchroeder!
I thought I should create a code for the purpose of reproducibility.
Finally I was able to successfully resolve the issue in R!
Here is the code in case you or anyone else is interested.
Considering, I have a column named fips2010
in data
dataframe which either contains 10 and/or 11 digits strings which should be converted as per the GISJOIN identifiers specified by NHGIS.
data <- data.frame(State = c("Alabama", "Alabama", "Delaware",
"Texas", "Wisconsin"),
County = c("Jefferson County", "Montgomary County", "Kent County",
"Travis County", "Milwaukee County"),
Tract = c("118.03", "1.00", "433.00", "13.07", "86.00"),
fips2010 = c("1073011803", "1101000100", "10001043300",
"48453001307", "55079008600"))
print(data)
##Output
State County Tract fips2010
1 Alabama Jefferson County 118.03 1073011803
2 Alabama Montgomary County 1.00 1101000100
3 Delaware Kent County 433.00 10001043300
4 Texas Travis County 13.07 48453001307
5 Wisconsin Milwaukee County 86.00 55079008600
Following the logic established in the NHGIS documentation, the code below converts the fipscode column to appropriate GISJOIN standard.
for (i in 1:nrow(data)) {
fips2010 <- data$fips2010[i]
if (nchar(fips2010) == 10) {
data$fips2010[i] <- paste0("G0", substr(fips2010, 1, 1), "0", substr(fips2010, 2, 4), "0", substr(fips2010, 5, 10))
} else if (nchar(fips2010) == 11) {
data$fips2010[i] <- paste0("G", substr(fips2010, 1, 2), "0", substr(fips2010, 3, 5), "0", substr(fips2010, 6, 11))
}
}
print(data)
##Output
State County Tract fips2010
1 Alabama Jefferson County 118.03 G0100730011803
2 Alabama Montgomary County 1.00 G0101010000100
3 Delaware Kent County 433.00 G1000010043300
4 Texas Travis County 13.07 G4804530001307
5 Wisconsin Milwaukee County 86.00 G5500790008600
A side by side comparison:
State County Tract fips2010 GISJOIN
1 Alabama Jefferson County 118.03 1073011803 G0100730011803
2 Alabama Montgomary County 1.00 1101000100 G0101010000100
3 Delaware Kent County 433.00 10001043300 G1000010043300
4 Texas Travis County 13.07 48453001307 G4804530001307
5 Wisconsin Milwaukee County 86.00 55079008600 G5500790008600
I hope this helps!