Consistency of DEPARTS, TRANTIME, and ARRIVES

I’m doing some analysis of work start times across time, using 1990-2022 data. 1990 and 2000 do not have the ARRIVES variable, so I compute it from DEPARTS and TRANTIME. To validate my methodology, I did the same computation for ACS data, to ensure my results roughly matched the ARRIVES variable in those years (roughly because the ARRIVES variable is binned, so may be off by a few minutes). For the most part it does, but for TRANTIME values that were topcoded there appears to be a discrepancy. The ARRIVES values are consistently lower than DEPARTS + TRANTIME. Based on the documentation, it seems like topcoded values for TRANTIME are represented as the state mean of all topcoded TRANTIMEs. I would have expected the ARRIVES variable to calculated either (1) using this mean value or (2) using the un-topcoded values, but it appears it was calculated with some other number—maybe the minimum of the topcoded values. For example:

SAMPLE DEPARTS TRANTIME ARRIVES calculated_arrives
2005-2009, ACS 5-year 4:05 184 6:34 7:09
2005-2009, ACS 5-year 7:32 184 10:04 10:36
2005-2009, ACS 5-year 3:05 184 5:34 6:09
2005-2009, ACS 5-year 6:32 184 9:04 9:36

In the first row, we have someone who departs at 4:05, and has a topcoded TRANTIME of 184 minutes (3 hours 4 minutes). I calculated that should be an arrives time of 7:09, but the data has an arrives time of 6:34, only 2 hours 29 minutes after departure. The topcoded values seem to consistently report an ARRIVES time that is less than what would be implied by the topcode, which wouldn’t make sense if the original value was used and the topcode value is a mean—some of the ARRIVES times would have to be later. Maybe the ARRIVES values are calculated based on the minimum topcode rather than the mean?

This spreadsheet has all of the examples I’ve found that are not related to allocation in the ACS samples I’m working with.

I suspect the origin of the discrepancy is in the difference between the topcode threshold value and the topcode value itself. In the ACS, topcodes are typically applied by the Census Bureau to encompass the highest 0.5% of values in each state for each year. Any case with a value above the threshold has its value replaced with the group’s mean value (i.e., all cases above the threshold are assigned the same value, which is the mean of all their original values). Both the threshold and topcode are reported in the original PUMS topcode documentation linked to in our topcode user guide (see the second column in the table).

For TRANTIME specifically, JWMNTPCT provides the threshold value and JWMNP is the value assigned to topcoded cases. Using your example of a TRANTIME value of 184 minutes I am seeing two states in the 2009 1-year file (MN and MO) that have JWMNP = 184 and jwmntpct = 150. In the cases you include, I see that these cases report arriving 2.5 hours (150 minutes) after their departure time. This suggests to me that while TRANTIME has been top-coded, the arrival and departure times have been masked by implementing the threshold value, though I was unable to find confirmation of this in the documentation. Note that TRANTIME counts the initial minute as time spent traveling. For example, if someone departs at 4:05 and they travel for 5 minutes, they will arrive at 4:09 (minute 1 is 4:05, minute 2 is 4:06, minute 3 is 4:07, minute 4 is 4:08, and minute 5 is 4:09).