I’m doing some analysis of work start times across time, using 1990-2022 data. 1990 and 2000 do not have the ARRIVES variable, so I compute it from DEPARTS and TRANTIME. To validate my methodology, I did the same computation for ACS data, to ensure my results roughly matched the ARRIVES variable in those years (roughly because the ARRIVES variable is binned, so may be off by a few minutes). For the most part it does, but for TRANTIME values that were topcoded there appears to be a discrepancy. The ARRIVES values are consistently lower than DEPARTS + TRANTIME. Based on the documentation, it seems like topcoded values for TRANTIME are represented as the state mean of all topcoded TRANTIMEs. I would have expected the ARRIVES variable to calculated either (1) using this mean value or (2) using the un-topcoded values, but it appears it was calculated with some other number—maybe the minimum of the topcoded values. For example:
SAMPLE | DEPARTS | TRANTIME | ARRIVES | calculated_arrives |
---|---|---|---|---|
2005-2009, ACS 5-year | 4:05 | 184 | 6:34 | 7:09 |
2005-2009, ACS 5-year | 7:32 | 184 | 10:04 | 10:36 |
2005-2009, ACS 5-year | 3:05 | 184 | 5:34 | 6:09 |
2005-2009, ACS 5-year | 6:32 | 184 | 9:04 | 9:36 |
In the first row, we have someone who departs at 4:05, and has a topcoded TRANTIME of 184 minutes (3 hours 4 minutes). I calculated that should be an arrives time of 7:09, but the data has an arrives time of 6:34, only 2 hours 29 minutes after departure. The topcoded values seem to consistently report an ARRIVES time that is less than what would be implied by the topcode, which wouldn’t make sense if the original value was used and the topcode value is a mean—some of the ARRIVES times would have to be later. Maybe the ARRIVES values are calculated based on the minimum topcode rather than the mean?
This spreadsheet has all of the examples I’ve found that are not related to allocation in the ACS samples I’m working with.