Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

propose: js active conn mgr #81

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions proposals/js-connection-manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# js active connection management

Authors: @vasco-santos <!-- List authors' GitHub or other handles -->

Initial PR: https://github.com/protocol/web3-dev-team/pull/81 <!-- Reference the PR first proposing this document. Oooh, self-reference! -->

<!--
This template is for a proposal/brief/pitch for a significant project to be undertaken by a Web3 Dev project team.
The goal of project proposals is to help us decide which work to take on, which things are more valuable than other things.
-->
<!--
A proposal should contain enough detail for others to understand how this project contributes to our team’s mission of product-market fit
for our unified stack of protocols, what is included in scope of the project, where to get started if a project team were to take this on,
and any other information relevant for prioritizing this project against others.
It does not need to describe the work in much detail. Most technical design and planning would take place after a proposal is adopted.
Good project scope aims for ~3-5 engineers for 1-3 months (though feel free to suggest larger-scoped projects anyway).
Projects do not include regular day-to-day maintenance and improvement work, e.g. on testing, tooling, validation, code clarity, refactors for future capability, etc.
-->
<!--
For ease of discussion in PRs, consider breaking lines after every sentence or long phrase.
-->

## Purpose &amp; impact
#### Background &amp; intent
_Describe the desired state of the world after this project? Why does that matter?_
<!--
Outline the status quo, including any relevant context on the problem you’re seeing that this project should solve. Wherever possible, include pains or problems that you’ve seen users experience to help motivate why solving this problem works towards top-line objectives.
-->

With the existing protocols in libp2p, as well as IPFS subsystems built on top of libp2p, such as Pubsub and the DHT, the need for a connection manager overhaul becomes an import work stream, so that these protocols operate as expected by the users, i.e. out of the box solution.

Currently, js nodes have a reactive connection manager that can be decoupled into two parts: establishing new connections and trimming connections. The former relies on an `autoDial` approach, where each time a new node is discovered it will try to establish a connection with that peer. The later consists on blindly trimming connections when reaching a configurable threshold.
vasco-santos marked this conversation as resolved.
Show resolved Hide resolved

From the current solution, there is a lot of space for improvement with tremendous value for the users. Either evolving the current connection manager to the state of the go-implementation or implementing a fully fledged [Connection Manager v2](https://github.com/libp2p/specs/pull/161) (+ [more notes](https://github.com/libp2p/notes/issues/13)).

With the connection manager overhaul in JS we aim to:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd distinguish between connection management as in "managing/closing existing connections" and "peer/service discovery". This proposal currently discusses both but they're pretty separate issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm seriously misunderstanding this proposal, this isn't the right approach.

Probably the misunderstanding with the proposal comes from this. You are probably right that the connection manager should be essentially responsible for "managing/closing existing connections". However, the role of peer/service discovery in my mind is as simple as discovering a given peer (with its announced multiaddrs) and store them in the PeerStore. This also relates on how the peer discovery interface is defined (I think this also is the case in go?).

After adding to the PeerStore, another libp2p "component" should act to figure out if we should try to connect to this peer or not, taking into account the topologies needs as well as the general needs of libp2p, such as relays...

This proposal currently discusses both but they're pretty separate issues.

I think this is the gap here. I am seeing the connection manager to be responsible for two main things:

  • Proactively establishing connections
  • Trimming connections

Proactively establishing connections is divided into two other things:

  • Act on peer discovery
  • Trigger peer discovery per needs (pubsub topics, relays, etc)

So, we essentially need to have this entity who is responsible to act as an orchestrator by leveraging Peer Discovery + Peer Store to fulfil the needs of topologies + libp2p. I see some overlap in this entity with the connection manager, such as decisions on wether we should close a connection in favour of establishing a connection with a peer that has more value for the node's needs, as well as to free connections that are not needed in the long run (like bootstrap nodes).

One way to solve this is some form of service where you can say "I need to be connected to X-Y peers that provide X (content, pubsub topic, etc.)" and this service is responsible for forming and maintaining those connections. But, my experience, it's rarely that simple. Usually, every service will need to manage its peers (e.g., because some peers might not actually offer the services they claim to offer, etc.).

Libp2p topology is probably the service we are talking here. I agree that it is a simpler option to just let services be responsible for forming the connections. However, this also has some implications that should be considered if we want to create a libp2p spec out of this. On top of my mind, we will not take efficient decisions and try to leverage connections that offer us more (decide between a peer running pubsub or a peer running pubsub + part of the dapp context), a peer part of more than one topology will be counted several times and make better decisions when trimming connections.

There is the Connection Overhaul Issue libp2p/js-libp2p#744 with a lot of information on how this would work, as well as an initial draft on how each component would interact. It is a pretty extensive issue, but perhaps it is worth reading.

Let me know what you think, if we can get to a better place to have this logic than the connection manager it might be good to separate the logic between proactive connections vs trimming. But, I overall think there will be overlap on the decision making logic.

- Find our closest peers on the network, and attempt to stay connected to n to them
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closest in terms of geography, ping latency or Kademlia XOR distance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kademlia distance so that we increase the probability that peers searching for our PeerId are able to find us. In Go this is done as part of the DHT refresh interval, but for JS, since we have multiple discovery mechanisms (delegate routers) this happens up a layer. It's worth noting that this behavior does now exist in JS, but reliability is questionable.

The browser doesn't benefit from this, because it won't be able to connect to those peers in most cases. This is where having an active relay (dial the peer for me), or a rendezvous style service would be beneficial.

vasco-santos marked this conversation as resolved.
Show resolved Hide resolved
- Finding, connecting to and protecting our gossipsub peers (same topics search)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does gossipsub have peer exchange yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has, but disabled by default. Anyway, the point here is mostly regarding the initial bootstrap of the pubsub overlay

- Finding and binding to relays with AutoRelay
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go-libp2p now has a list of known-good "relays". We've found that random relays on the network just don't behave well enough to be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not binding to relays automatically yet, just have things in place if people want to enable it. However, I agree that this should probably be provided (but perhaps in the bootstrap list?)

- Finding and binding to application protocol peers (as needed via MulticodecTopology) -- We should clarify what libp2p will handle intrinsically and what users need to do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in js-libp2p, we have introduced the concept of Topologies. It has some thoughts from libp2p/notes#13

In summary what happens is:

  • Pubsub, DHT and any other libp2p protocol/subsystem can register themselves as a topology in libp2p registrar
  • Once a connection is established, identify protocol kicks in and running protocols of the peer are stored in the ProtoBook
  • When a peer is added to the protoBook, a topology callback is called and we verify if that peer is running the topology protocol. If so, for instance pubsub subsystem is notified and it can open a pubsub stream from that connection

If a dapp creates a protocol (let's consider Slate example again), they will likely want to have a Slate topology where once a new peer in slate is running they can be notified and establish an "application overlay". Another thing to consider here would be the concept of MetadataTopology together with MulticodecTopology as we should be able to create topologies per metadata as well to enable some scenarios.

Note that the topology receives a minPeer, maxPeer but we are not using them yet. This is related to the goal of this proposal. The idea here is to inform libp2p of the requirements of the topology in terms of numbers of needed peers to operate reliably. With this, libp2p can act to fulfil the needs of each requirements and properly manage the connections that are needed.


All the above scenarios need proper setups from our users at the moment, which means they need to manually find the nodes they want to connect and connect to them, as libp2p connection manager will not be actively looking at meaningful connection to establish.

An active connection manager will enable out of the box creation of meaningful overlay networks, as well as enable browser nodes to bind to AutoRelay nodes, so that they can be dialed from other nodes in the network. Moreover, JS nodes keep the discovered peers stored in the PeerStore. On subsequent node starts, these nodes should prioritize their previous established topology over the bootstrap nodes if possible.

#### Assumptions &amp; hypotheses
_What must be true for this project to matter?_
<!--(bullet list)-->

- Web3 developers want to have a reliable pubsub topology out of the box without relying on star servers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasco-santos @aschmahmann @jacobheun can you help flesh this one out a little more (here in comments is probably fine)?

What can we assert today about the demand for solid pubsub in the browser? Do we have good use-cases that we know users are attempting to rely on today? And what signals (maybe in the form of possible use-cases they are/have been attempting to use) should we be looking for when talking to users to back up this assumption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow here is that when I subscribe to a topic, libp2p should automatically find peers for me. In Go, this involves advertising and querying the DHT for topics I am subscribed too. In order for JS to effectively discover those peers, regardless of their implementation language, we need to advertise and discover topics on the same service (the public DHT). So the dependency here is being able to effectively query the DHT to discover and advertise our topics.

Re: the browser, pubsub is one of the more effective forms of communicating with peers browser to browser, as I can leverage indirect links to communicate, which is ideal due to the connection limits of browsers. Usage here is mostly anecdotal at this point, there was a lot of interest here during HackFS last year that Vasco and I spent quite a bit of time supporting people on. Matrix would be a strong potential use case here, and would be great to interview for gaps in the stack.

I don't think this matters for the connection manager though. The key thing here is to be able to tag/weight connections to protect valuable ones, and ensure we're trimming stale/less useful connections:

  • Ability to prioritize connections to close Kademlia space peers to improve peers' ability to discover us
  • Ability to prioritize connections to pubsub peers with n+1 common topics to us to maintain valuable meshes
  • Ability to decay connections so that stale ones are pruned to focus resources on valuable connections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobheun I think you are focused on the flows after we have the connections. The first step should be to have an active connection manager to replace our current autoDial approach.

The connection manager should actively:

  • trigger peer discovery mechanisms for peers of interest
    • if they run pubsub, trigger a discovery query for peers with same subscriptions and establish a connection
    • trigger queries to closest peers to establish connections
    • ...
  • on restart check PeerStore to establish a meaningful set of connections

After the connection manager is able to actively establish important connection rather than let's connect every peer discovered unless we already have too many connections, the second part would be the trimming scope

- Web3 developers want to find and connect to other peers in a given scope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to connection gating? This sounds like it's trying to account for discovery mechanisms which would be out of scope for this. Being able to restrict who I am connecting to/is connecting to me would be in this scope though. Although it could be done as a separate piece of work. Priority here being:

  1. Get me more connections to peers so I can effectively use the network (JS is currently weak here, especially in browser)
  2. Allow me to prioritize connections to optimize my resources (tagging)
  3. Allow me to restrict outbound/inbound connections based on definable criteria (custom connection gaters for allow/deny listing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to: #81 (comment)

- Browser developers want to have their nodes reachable via more transports than `webrtcSTAR` out of the box
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fits here, how does this apply to connection management?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what I was trying to get from this, so I will remove it for now

- PL wants to decrease the load on bootstrap nodes
vasco-santos marked this conversation as resolved.
Show resolved Hide resolved

#### User workflow example
_How would a developer or user use this new capability?_
<!--(short paragraph)-->

Out of the box usage with sane defaults. However, the developer should be enabled to configure the proactive dial strategies (power users)

#### Impact
_How would this directly contribute to web3 dev stack product-market fit?_

<!--
Explain how this addresses known challenges or opportunities.
What awesome potential impact/outcomes/results will we see if we nail this project?
-->

This solves (together with browsers being able to dial go nodes) most of the common issues of our users when trying to use features like IPNS and Pubsub and hitting problems.

#### Leverage
_How much would nailing this project improve our knowledge and ability to execute future projects?_

<!--
Explain the opportunity or leverage point for our subsequent velocity/impact (e.g. by speeding up development, enabling more contributors, etc)
-->

#### Confidence
_How sure are we that this impact would be realized? Label from [this scale](https://medium.com/@nimay/inside-product-introduction-to-feature-priority-using-ice-impact-confidence-ease-and-gist-5180434e5b15)_.

<!--Explain why this rating-->


## Project definition
#### Brief plan of attack
<!--Briefly describe the milestones/steps/work needed for this project-->

There is a proposal already for a [full connection manager overhaul](https://github.com/libp2p/js-libp2p/issues/744). There are a few items that should not be considered into this scope. Connection Gating, Retries and Disconnect are good candidates for a follow up proposal. Decaying tags might have their value, but depend on how far we aim to reach as a first step. Proactive dial, Trim connections, Keep Alive and being able to protect important connections are essential for the scope of this proposal.

#### What does done look like?
_What specific deliverables should completed to consider this project done?_

#### What does success look like?
_Success means impact. How will we know we did the right thing?_

<!--
Provide success criteria. These might include particular metrics, desired changes in the types of bug reports being filed, desired changes in qualitative user feedback (measured via surveys, etc), etc.
-->

#### Counterpoints &amp; pre-mortem
_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_

#### Alternatives
_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatives is probably not the best place for this, but as prior art/previous discussions here are some previous notes on a topology/mesh approach to connection management libp2p/notes#13. Tagging is likely simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a question of solution design right? The issue where we have been discussing the general solution for ConnectionManager already includes reference for it: libp2p/js-libp2p#744


#### Dependencies/prerequisites
<!--List any other projects that are dependencies/prerequisites for this project that is being pitched.-->

#### Future opportunities
<!--What future projects/opportunities could this project enable?-->

## Required resources

#### Effort estimate
<!--T-shirt size rating of the size of the project. If the project might require external collaborators/teams, please note in the roles/skills section below).
For a team of 3-5 people with the appropriate skills:
- Small, 1-2 weeks
- Medium, 3-5 weeks
- Large, 6-10 weeks
- XLarge, >10 weeks
Describe any choices and uncertainty in this scope estimate. (E.g. Uncertainty in the scope until design work is complete, low uncertainty in execution thereafter.)
-->

#### Roles / skills needed
<!--Describe the knowledge/skill-sets and team that are needed for this project (e.g. PM, docs, protocol or library expertise, design expertise, etc.). If this project could be externalized to the community or a team outside PL's direct employment, please note that here.-->