Application for creating GTFS-RT Full dataset from GTFS-RT TripUpdate, ServiceAlert or VehiclePosition messages published in Pulsar. The full dataset will be then uploaded to a remote file server regularly. Currently supported is Azure Blob Storage. There is an option also to save the file locally for easier debugging.
Currently application has separate logic for publishing ServiceAlerts, TripUpdates and VehiclePositions.
- With ServiceAlerts we can just publish the latest FeedMessage since it contains the whole state
- With TripUpdates we need to cache all incoming FeedMessages which contain information about a single trip.
- When publishing we aggregate FeedEntities from each FeedMessage and publish them under one FeedMessage
- We also want to filter past events from the output feed (and the cache)
- Currently we do this by filtering the entire FeedEntity once all events are old or for cancellations when the trip start time is too old
- VehiclePositions are cached using unique vehicle ID as a key
- Vehicles that have not published data recently are filtered
- All cached vehicle positions are published in one FeedMessage
This project depends on transitdata-common project.
Either use released versions from the GitHub Packages repository (Maven) or build your own and install to local Maven repository:
cd transitdata-common && mvn install
mvn compile
mvn package
- Run this script to build the Docker image
- Pulsar
- Azure Storage account (if publishing to Azure)
- Azurite emulator can be used for testing locally
OUTPUT_DESTINATION
: where to publish the file, eitherlocal
orazure
DATA_TYPE
: type of the data, eitherTripUpdate
,ServiceAlert
orVehiclePosition
DUMP_INTERVAL
: interval for publishing the dataUNHEALTHY_TIMEOUT
: timeout when to consider the service unhealthy if no data has been published
OUTPUT_LOCAL_PATH
: path where to publish the file if using local output destination
AZURE_ACCOUNT_NAME
: Azure Storage account nameAZURE_ACCOUNT_KEY_PATH
: path to the file that contains Azure Storage account connection stringCACHE_MAX_AGE
: value to use for HTTP caching header
TRIP_UPDATE_MAX_AGE
: maximum age for trip updateTRIP_UPDATE_MAX_AGE_AFTER_START
: maximum age for trip update starting from its scheduled departure time. This option is used to filter cancellation messages which don't have any stop estimatesTRIP_UPDATE_TIMEZONE
: timezone used in the trip updatesTRIP_UPDATE_FULL_DATASET_CONTAINER
: name of the container where to publish trip updates (Blob Container or local directory)TRIP_UPDATE_GOOGLE_DATASET_CONTAINER
: name of the container where to publish Google-specific trip updatesVEHICLE_POSITION_MAX_AGE
: maximum age for vehicle positionVEHICLE_POSITION_FULL_DATASET_CONTAINER
: name of the container where to publish full vehicle position feedVEHICLE_POSITION_BUSTRAM_DATASET_CONTAINER
: name of the container where to publish vehicle positions for buses and tramsVEHICLE_POSITION_TRAINMETRO_DATASET_CONTAINER
: name of the container where to publish vehicle positions for trains and metrosSERVICE_ALERT_CONTAINER
: name of the container where to publish service alerts