Proposal to change guardian logging from Google Cloud Logs and metrics storage #3132
Replies: 3 comments 3 replies
-
A few comments from my team member: I would try to set a standard for all operators regarding:
The parsing of the logs should be defined by them and not be let up to whoever implements it |
Beta Was this translation helpful? Give feedback.
-
+1 to this proposal. Better, cheaper, and more scalable for multiple contributors to have equal access during development and incident response. Loki is a great UI long term. One thing to consider is if we need all the current roughly 30TB logs or not in a system like Loki, and if we want to store them for more than 30 days. A large chunk of that 30TB was added in the last year to aid security incident response, which is one time queries that generally involve intensive analysis on a subset of logs 'offline' (vs queries in a web UI like Loki). This probably mostly boils down to cost for the Foundation, but it would be possible to imagine a way to route all logs to a much more scalable / cheaper Kafka -> GCS/S3 archive and then just the subset we want to show in graphs to Loki. I think you could make that decision in the Kafka layer so changes to it would not require Guardian code changes. |
Beta Was this translation helpful? Give feedback.
-
We are enthusiastic about the concept of log aggregation and analysis; however, it is crucial to acknowledge that logs may contain sensitive information, eventually. In our infrastructure, we leverage the combined capabilities of Gcloud and Loki for various tasks. Gcloud provides typical benefits of cloud service ( faster provision, no need to setup/maintain ), while Loki in self-hosted mode offers compelling advantages such as really low costs. To ensure utmost flexibility for everyone, we vote for migrating to Loki for cost reduction. |
Beta Was this translation helpful? Give feedback.
-
Guardian Logging and Metrics Change Proposal
Currently, guardians can opt in to send their guardian logs centrally via the following flags:
--telemetryKey
--telemetryServiceAccountFile
--telemetryProject
This sends all guardian logs to google cloud logs for usage by any guardian in the protocol for troubleshooting and improving the network. The protocol generates approximately 30T of logs monthly and would like to keep them for at least 30 days.
Additionally, there isn't a central location for guardian metrics. Each guardian maintains their own metrics. It is often useful for wormhole contributors to view metrics for a mainnet guardian to ensure things are running correctly network-wide.
Problem
Solution
The Wormhole Foundation uses Grafana Enterprise and sends logs to Loki. The metrics could (in theory) be sent to a prometheus push gateway or guardians could use the remote write feature for grafana enterprise metrics.
Benefits:
Deliverables
security
label to all security relevant logs). The goal for this deliverable is consistency in the labels.1.1 Ensuring the various watchers use similar labels and logs
3.1. Get guardians who opt-in to sending telemetry to migrate to loki
3.2 Remove gcp log support from the guardian node
Beta Was this translation helpful? Give feedback.
All reactions