0050 XLS-50d: Aiming for a healthy distribution of (validator) infrastructure #145
Replies: 5 comments 14 replies
-
Much needed. Consider adding an entry for who/which entity has physical custody of the server too. (Can be simply listed as "private individual" if no company.) |
Beta Was this translation helpful? Give feedback.
-
Since the network is based on overlapping trust, the addition of a section for "UNL management" (for validators) would improve this proposal. |
Beta Was this translation helpful? Give feedback.
-
Bithomp now shows those values in the "Developer mode" (the checkbox on the top) https://bithomp.com/validators |
Beta Was this translation helpful? Give feedback.
-
We are currently evaluating cloud providers to host a validator. re 1. Validator operator resource location selection What would be the best approach to build the current list of locations? Is there some public work on the main net topology? Do we need all validators to publish this information or can we build what we need from existing data? Thanks for your help |
Beta Was this translation helpful? Give feedback.
-
I think all information except asn is really useful for building automated topology map of ledger if majority of validators will adapt it. But I do not think that including asn is a good choice. Revieling asn of gigants like gcp, azure maybe is not so much of the threat, but smaller hosting companies might have just a few of them and depending on configuration direct IP of validator might be leaked. At least in our case it is. Querying https://hackertarget.com/as-ip-lookup/ with our asn shows validator IP. |
Beta Was this translation helpful? Give feedback.
-
Abstract
The network is at risk of centralised consensus failure when too many validators (especially dUNL validators) are running on the same cloud provider/network. While negative UNL helps, the network is still at the risk of a halt when a large infra/cloud provider has an outage.
This proposal introduces best practices for the geographical & provider distribution of validators, something those composing dUNL contents & those adding trusted validators to their
rippled
&xahaud
config manually.ℹ️ Note: this proposal is only focussing at validators, as their (instant lack of) availability could harm network forward progress. All arguments (motivation) apply to RPC nodes, hubs, as well, but e.g. RPC nodes could benefit from live HTTP failover availability & them being offline doesn't directly harm the network's forward progress.
Motivation
With high performing cloud servers (VPS, dedicated) are more and more common, and with cloud providers often beating convenience and price of self hosting / colocation, several validator operators are running their validators at the same cloud providers. Independently executed network crawls show large clusters of validators at only a handful cloud providers. Most common providers are:
When looking at traceroutes, things look even worse when taking the datacenter/POP of these cloud providers into account, as cloud providers, through availability and pricing, route customers to specific preferred datacenter locations (and thus: routes).
As a result of a significant number of validators running on cloud infra at a small number of cloud providers, an outage at one cloud provider (datacenter, power, infra, routing, BGP, ...) can potentially drop the network below the consensus threshold. With Negative UNL taking 256 ledgers to kick in, this is unwanted.
Proposal
Things to take into account picking a suitable provider (cloud, infrastructure):
1. Validator operator resource location selection
Validator operators (even when not on the UNL, if they prefer to be on dUNL / in each other's validator lists) should always:
2. TOML contents
A typical
validator
section in thexrp-ledger.toml
andxahau.toml
preferably contains aserver_country
property:The following properties must be added:
network_asn
(integer) containing the ASN (autonomous system number) of the IP address the validator uses to connect out (see Appendix A & Appendix B)⚖️ CONSIDERATION: publishing the ASN could be considered an attack vector, as it would expose the provider to DDoS to take down connectivity to certain validators. However, if the network is sufficiently decentralised/distributed from a infrastructure point of view, taking down one or to routes wouldn't harm the network.
The following properties should be added:
server_location
(string) containing a written explanation of provider provided location details (see Appendix A)server_cloud
(boolean) if the server is running at a cloud provider (e.g. VPS / cloud dedicated: a server you will never physically see, won't know how to find is a cloud server., so VPS / dedicated rented machine: cloud = true. Your own bare metal, colocated: cloud = false)3. UNL publishers & validator operators
UNL publishers & validator operators should start to obtain ASN & geographical location from trusted validator operators, and take them into account when composing the UNL list contents.
UNL publishers should not add validators to the UNL without the above TOML properties.
Call for immediate action
Validator operators
UNL publishers
Appendix A - sample TOML values
Example: Cloud: VPS / Rented dedicated
Example: Non-Cloud: Self hosted or own hardware, colocated
Appendix B - obtaining the ASN (autonomous system number)
To obtain the ASN for your IP address (usually your provider or provider's upstream provider), you will need the IP address used by your validator to connect out to others. Providing there are no proxy settings & the machine has native IP connectivity to the outside world, you can find/confirm your outgoing IP address by executing (bash):
You can find the ASN for the IP range this IP is allocated from using:
A one-line command to obtain IP and obtain ASN:
whois -h whois.cymru.com $(curl --silent https://myip.wtf/text)
Sample output:
In this case
14061
is the integer value for yournetwork_asn
property in your TOML file.Beta Was this translation helpful? Give feedback.
All reactions