Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport: Bring litep2p fixes and latest version to stable2409 #6497

Open
wants to merge 8 commits into
base: stable2409
Choose a base branch
from

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Nov 15, 2024

I've noticed that litep2p had an older version for the stable released branches.

This PR backports litep2p related changes to the latest release, since litep2p is not enabled by default it should not affect inflight testing so far:

This PR is affecting libp2p functionality as well, however worst case it should not affect current behavior and network disoverability of validators:

This commit has been added for ease of cherry-picking: e606a38

cc @paritytech/networking

lexnv and others added 8 commits November 15, 2024 14:13
This release introduces several new features, improvements, and fixes to
the litep2p library. Key updates include enhanced error handling,
configurable connection limits, and a new API for managing public
addresses.

For a detailed set of changes, see [litep2p
changelog](https://github.com/paritytech/litep2p/blob/master/CHANGELOG.md#070---2024-09-05).

This PR makes use of:
- connection limits to optimize network throughput
- better errors that are propagated to substrate metrics
- public addresses API to report healthy addresses to the Identify
protocol

Measuring warp sync time is a bit inaccurate since the network is not
deterministic and we might end up using faster peers (peers with more
resources to handle our requests). However, I did not see warp sync
times of 16 minutes, instead, they are roughly stabilized between 8 and
10 minutes.

For measuring warp-sync time, I've used
[sub-trige-logs](https://github.com/lexnv/sub-triage-logs/?tab=readme-ov-file#warp-time)

Phase | Time
 -|-
Warp  | 426.999999919s
State | 99.000000555s
Total | 526.000000474s

Phase | Time
 -|-
Warp  | 731.999999837s
State | 71.000000882s
Total | 803.000000719s

Closes: #4986

After exposing the `litep2p::public_addresses` interface, we can report
to litep2p confirmed external addresses. This should mitigate or at
least improve: #4925.
Will keep the issue around to confirm this.

We are one step closer to exposing similar metrics as libp2p:
#4681.

cc @paritytech/networking

- [x] Use public address interface to confirm addresses to identify
protocol

---------

Signed-off-by: Alexandru Vasile <[email protected]>
…5998)

This PR ensures that the `litep2p.public_addresses()` never grows
indefinitely.
- effectively fixes subtle memory leaks
- fixes authority DHT records being dropped due to size limits being
exceeded
- provides a healthier subset of public addresses to the `/identify`
protocol

This PR adds a new `ExternalAddressExpired` event to the
litep2p/discovery process.

Substrate uses an LRU `address_confirmations` bounded to 32 address
entries.
The oldest entry is propagated via the `ExternalAddressExpired` event
when a new address is added to the list (if capacity is exceeded).

The expired address is then removed from the
`litep2p.public_addresses()`, effectively limiting its size to 32
entries (the size of `address_confirmations` LRU).

cc @paritytech/networking @alexggh

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
This PR updates litep2p to the latest release.

- `KademliaEvent::PutRecordSucess` is renamed to fix word typo
- `KademliaEvent::GetProvidersSuccess` and
`KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work
and will be utilized later

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](paritytech/litep2p#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](paritytech/litep2p#246))
- kad: Providers part 6: stop providing
([#245](paritytech/litep2p#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](paritytech/litep2p#236))
- kad: Providers part 4: refresh local providers
([#235](paritytech/litep2p#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](paritytech/litep2p#234))

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](paritytech/litep2p#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](paritytech/litep2p#255))
- kad/executor: Add timeout for writting frames
([#277](paritytech/litep2p#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](paritytech/litep2p#233))
- peer_state: Robust state machine transitions
([#251](paritytech/litep2p#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](paritytech/litep2p#250))
- kad: Remove unused serde cfg
([#262](paritytech/litep2p#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](paritytech/litep2p#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](paritytech/litep2p#254))

- tcp/websocket/quic: Fix cancel memory leak
([#272](paritytech/litep2p#272))
- transport: Fix pending dials memory leak
([#271](paritytech/litep2p#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](paritytech/litep2p#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](paritytech/litep2p#273))
- kad: Fix not retrieving local records
([#221](paritytech/litep2p#221))

See release changelog for more details:
https://github.com/paritytech/litep2p/releases/tag/v0.8.0

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
…mmand (#6016)

Previously, when receiving the `SetReservedPeers { reserved }` all peers
not in the `reserved` set were removed.

This is incorrect, the intention of `SetReservedPeers` is to change the
active set of reserved peers and disconnect previously reserved peers
not in the new set.

While at it, have added a few other improvements to make the peerset
more robust:
- `SetReservedPeers`: does not disconnect all peers
- `SetReservedPeers`: if a reserved peer is no longer reserved, the
peerset tries to move the peers to the regular set if the slots allow
this move. This ensures the (now regular) peer counts towards slot
allocation.
- every 1 seconds: If we don't have enough connect peers, add the
reserved peers to the list that the peerstore ignores. Reserved peers
are already connected and the peerstore might return otherwise a
reserved peer


### Next Steps

- [x] More testing

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Co-authored-by: Michal Kucharczyk <[email protected]>
…6380)

This PR ensures that external addresses with different PeerIDs are not
propagated to the higher layer of the network code.

While at it, this ensures that libp2p only adds the `/p2p/peerid` part
to the discovered address if it does not contain it already.

This is a followup from:
- #6298

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
#6298)

This PR's main goal is to add public listen addresses to the DHT
authorities records. This change improves the discoverability of
validators that did not provide the `--public-addresses` flag.

This PR populates the authority DHT records with public listen addresses
if any.

The change effectively ensures that addresses are added to the DHT
record in following order:
 1. Public addresses provided by CLI `--public-addresses`
 2. Maximum of 4 public (global) listen addresses (if any)
3. Any external addresses discovered from the network (ie from
`/identify` protocol)


While at it, this PR adds the following constraints on the number of
addresses:
- Total number of addresses cached is bounded at 16 (increased from 10).
- A maximum number of 32 addresses are published to DHT records
(previously unbounded).
  - A maximum of 4 global listen addresses are utilized.
      
This PR also removes the following warning:
`WARNING: No public address specified, validator node may not be
reachable.`

### Next Steps
- [ ] deploy and monitor in versi network

Closes: #6280
Part of: #5266

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
@lexnv lexnv added the A3-backport Pull request is already reviewed well in another branch. label Nov 15, 2024
@lexnv lexnv self-assigned this Nov 15, 2024
@lexnv lexnv requested a review from koute as a code owner November 15, 2024 13:33
@lexnv lexnv changed the title Lexnv/backport litep2p 0.8.1 backport: Bring litep2p fixes and latest version to stable2409 Nov 15, 2024
Copy link

This pull request is amending an existing release. Please proceed with extreme caution,
as to not impact downstream teams that rely on the stability of it. Some things to consider:

  • Backports are only for 'patch' or 'minor' changes. No 'major' or other breaking change.
  • Should be a legit fix for some bug, not adding tons of new features.
  • Must either be already audited or not need an audit.
Emergency Bypass

If you really need to bypass this check: add validate: false to each crate
in the Prdoc where a breaking change is introduced. This will release a new major
version of that crate and all its reverse dependencies and basically break the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A3-backport Pull request is already reviewed well in another branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant