-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
r5r benchmarks for new gtfs_traveltimes fn #73
Comments
@mpadge, thanks for the update on congrats on the progress with I think you're right, most of the difference is probably due different algorithm heuristics under the hood of I've tried running your example on a sample GTFS feed that comes within
|
@rafapereirabr sorry for delayed response: No, it is not possible to run And by the way, the |
Hi @mpadge . Thanks for the update! I guess the lack of a |
I think another factor that influences the difference in results and in computation times is that R5 is really focused on multi-modal travel where you first travel to a public transport stop before you start your public transport trip. I might be off at some points, and don't know exactly to what extent some parts of the computations are pre-computed when building the network, but the routing process of R5 should go something like this: When you provide the coordinates of a trip origin, it will first find the nearest point on the street network and use a straight-line distance between the actual origin and the snapped point in combination with a walking speed to estimate the time it takes you to reach the street. It will always do this, even when you set This procedure repeats at the end of the trip. It will use a straight-line distance and walking speed to estimate the travel time between the actual public transport stop location and its snapped location on the street network, then use the given egress mode to travel to the snapped location of the trip destination, and finally use a straight-line distance and walking speed to estimate the travel time between the snapped location of the trip destination and the actual location of the trip destination. In the benchmark, we have a case where the origin location of the trip is equal to the actual location of a public transport stop. But R5 does not recognize that and uses its normal procedure. Hence, it estimates the travel time from the trip origin to the street network, then attemps to route to the snapped location of the public transport stop (which is the same in this case, i.e. travel time of this leg equals 0), and finally estimates the travel time from the street network to the public transport stop (which in this case is the travel time from the street network back again to the origin location). This repeats at the end of the trip. This behaviour is explained briefly in the R5 docs, see here
|
Thanks for the input @luukvdmeer - that is in fact also what The one caveat with the above results is that they were generated using the default I presume that once |
I tried to understand their methodology a bit better and found that R5 does actually mention they use "Walking via the street network between transit stops for transfers". I did not look at their source code to check their exact implementation, but from their documentation I get that at each stop they consider transfers to all other stops within a given walking time (default 20 minutes), constrained by a maximum walking distance of 2km. See here. But apart from that, I was mainly trying to emphasize that even for the exact same journey between two stops, R5 might already return higher travel time estimates and have longer computation time simply because in contradiction to gtfsrouter it doesn't recognize that the start and end points of the journey are actual transport stop platforms already. Of course I am not sure for how much of the difference in times that accounts, but it might play a role in there. |
Re-opened in response to the |
Hi everyone. I just commented on the linked r5r issue at ipeaGIT/r5r#183. I understand that a lot of what you are discussing here about R5 is not well documented, so I will try to provide some clarification. I'll cover some high-level points first, and then we can go into detail if you have more specific questions. There are two main issues here: 1) speed of execution and 2) inflation of travel times relative to gtfsrouter. 1) Speed of executionR5 is completely multimodal (chaining walking and biking or car with transit, routing from and to arbitrary geographic locations). From an origin point, it will go to the closest street and follow that to every reachable transit station, then ride transit to every other reachable transit station, then follow the street network to all specified destination points. In everyday use, we perform searches with up to 2-3 million destination points arranged on a regular grid, which could be very distant from any transit station. To each one of these destinations, it tries to find a large number of different travel times (generally between 200 and 1500), including waiting time, over a large number of departure times (and realizations of frequency-based schedules), in order to construct a statistical distribution of the travel times that a rider may experience under different conditions. Constructing this large set of optimal arrival times at every destination, for every different departure time, is rather time consuming. This process (which we call "propagation", i.e. propagation of travel times out from transit stops, through the street network and "empty space" to final destinations) is usually the majority of run time. It appears that you're benchmarking against only the public transit part of the search. Even in very large networks like that of the whole Netherlands, I would expect R5 to perform such a transit-only search in a few hundred milliseconds. That involves finding hundreds of separate paths to each of over 40k transit stops. The rest of the time is spent on propagation, managing and creating statistical summaries of the very large data tables that result. And also on the initial search through the streets, from the origin to all reachable transit stops, which can take longer than the transit search itself. The "general case" routing methods in R5 do not have any special considerations for routing from and to transit stops (instead of arbitrary geographic points). It would be possible for a system that uses R5 (such as R5R) to call only the transit portion and bypass the rest, but I imagine that's not how it works. Although we have spent quite a bit of time optimizing this (as it has a direct impact on our operating costs and ability to provide results with rapid turnaround) speed is not our highest priority. We are more focused on the stability and maintainability of the code, and the ability of future maintainers to comprehend it. 2) Inflation of travel timesAs we discussed in our recent call with @mpadge, a good comparison here is going to require a clear and rigorous definition of what is meant by "optimal", ensuring both systems are seeking the same results. After reading your issue on the r5r repository describing the kind of solutions gtfsrouter seeks, I believe that r5's output for a given origin-destination pair is a superset of that produced by gtfsrouter: the result gtfsrouter is looking for can be selected from among the N results produced by r5. At the lowest level R5 is optimizing arrival time given a single departure time; at a higher level it is optimizing end-to-end travel time for departure at any time within a window. Or more accurately, it it it is finding a distribution of travel times, including all waiting time, for uniformly distributed departure times within a given window, and you can choose the shortest one or any other other one you like. However, it looks like you're benchmarking R5 via R5R, and I do not know how R5R is using R5, or what results it is selecting from R5's output. The output will depend heavily on exactly which routing method is called in R5 and how its output is interpreted. We'll need to examine that more closely. My sense is that R5R is using R5 in a specific, constrained way, and we just need to better document the results of those choices. Note that I did not work on R5R and have only briefly looked at its source code so this will require a bit of time and coordination with its developers. |
@luukvdmeer wrote:
Correct, in normal usage we pre-compute transfers from each stop to all other stops within a certain radius. (Some simplification is applied there, eliminating transfers that are not expected to ever contribute to an optimal path.) These transfers are stored in a table though, not recreated at each call, so this process does not have any speed impact at runtime. Unlike some other data formats exported from scheduling software, in GTFS the transfers table usually only states which transfers are preferred, constrained, or forbidden. All "normal" transfers are expected to be inferred from the street network (or failing that, straight line distance). We typically use networks built from OpenStreetMap, which in many places include detailed pedestrian walkways and micro-mapped train stations.
This is correct. It would be possible to directly call the transit router which thinks in terms of stop platforms, but it would be a different code path. In the case where you are focused only on transit stops, the "general case" code path does incur a lot of overhead. But in our standard use case (urban planning analysis), the origin and destination points are homes, workplaces, public facilities, or sample points representing small areas. The special case where you are only interested in the transit stops themselves is not common. Indeed we adopt a pretty strong position that transportation infrastructure should be evaluated in terms of its impact on people's access to opportunities, which are mostly situated outside the infrastructure. |
@luukvdmeer wrote:
This is correct, passing via the street network will increase the travel time a bit. The documentation you cited was just to confirm the existence of this discrepancy and explain it, not to justify it - clearly it is not ideal, and we eventually want to introduce a special case for connecting the origin point directly to a transit station. But as mentioned in another comment, most of our use cases do not involve routing from stations, so we want to carefully consider impacts on code complexity or interference with other functionality. Besides the increase in reported travel times, also note the significant impact on execution time. This initial search does not only go to the nearest station, but through the street network to all other reachable transit stations. Unlike the transfers, this street search is not precomputed and not cached - this takes a significant amount of time on every call. It wouldn't be surprising if this initial street search takes significantly longer than the transit search itself. |
Thank you very much @abyrd for these detailed and helpful considerations. I'll respond in more detail in a couple of weeks (currently on leave), and feed a lot of that back in to documentation, including detailed comparisons of these kinds of differences between |
@rafapereirabr
gtfsrouter
now has a new turbo-chargedgtfs_traveltimes
function, largely motivated by the support of the Mobility Institute Berlin, and the outstanding input of @AlexandraKapp. This issue is intended to document first comparisons against equivalent times generated from your amazingr5r
package.Disclaimer:
r5r
is a truly general package able to do what it says, "Realistic Routing on Real-world and Reimagined networks."gtfsrouter
in comparison is just a routing package for themode = "TRANSIT"
component only, and so has comparably much more restricted functionality thanr5r
.Additional notes:
r5r
results in the followingreprex
were pre-generated using the same GTFS feed for the "Verkehrsverbund Berlin Brandenburg" (VBB), Germany. Both queries were run for a Tuesday (8th Dec 2020).gtfsrouter
has a function to expand transfer tables provided with feeds, to enable more realistic, and more extended, pedestrian transfers between stops, analogous to the additional pedestrian routing offered byr5
. This is implemented below in the lines starting withgtfs_transfer_table()
, which has a default upper time limit of 120 seconds.Created on 2021-01-27 by the reprex package (v0.3.0)
Conclusions
gfsrouter
calculates full travel times to > 13,000 stations in < 1 second, whereasr5r
takes around 8 seconds to reach around 11,000 of those stations.gtfsrouter
generates travel times that are around 13% quicker thanr5r
.Why might the travel times be so much faster? My suspicion at present would be because
r5
, and sor5r
by direct extension, uses an implementation of theraptor
algorithm . As discussed in #61, the raptor algorithm only finds routes between successive "optimal" end-points, and so excludes the possibility of the optimal route to some end-point actually requiring intermediate travel along sub-optimal routes. In contrast, the new algorithm implemented here traces all possible routes between all pairs of stops, whether optimal or not, and only determines optimal routes once the algorithm has finished tracing the entire timetable. My hypothesis would be that the differences observed above primarily reflect this difference, and particularly the inability ofraptor
to use sub-optimal intermediate routes.The text was updated successfully, but these errors were encountered: