Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP ]Intrusive shamap inner final #5152

Draft
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

vlntb
Copy link
Collaborator

@vlntb vlntb commented Oct 3, 2024

High Level Overview of Change

This PR finalises the work authored by Scott Determan (https://github.com/seelabs) and is based on the original PR (#4815).

Context of Change

There are two goals:

  • Synchronise this change with the most recent develop branch.
  • Address outstanding questions raised in the original PR.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Performance (increase or change in throughput and/or latency)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)
  • Documentation update
  • Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • Release

Scott Determan added 4 commits October 2, 2024 17:25
This branch has a long history. About two years ago I wrote a patch to
remove the mutex from shamap inner nodes (ref:
https://github.com/seelabs/rippled/tree/lockfree-tagged-cache). At the
time I measured a large memory savings of about 2 gig. Unfortunately,
the code required using the `folly` library, and I was hesitant to
introduce such a large dependency into rippled (especially one that was
so hard to build). This branch resurrects that old work and removes the
`folly` dependency.

The old branch used a lockless atomic shared pointer. This new branch
introduces a intrusive pointer type. Unlike boost's intrusive pointer,
this intrusive pointer can handle both strong and weak pointers (needed
for the tagged cache). Since this is an intrusive pointer type, in order
to support weak pointers, the object is not destroyed when the strong
count goes to zero. Instead, it is "partially destroyed" (for example,
inner nodes will reset their children). This intrusive pointer takes
16-bits for the strong count and 14-bits for the weak count, and takes
one 64-bit pointer to point at the object. This is much smaller than a
std::shared_pointer, which needs a control block to hold the strong and
weak counts (and potentially other objects), as well as an extra pointer
to point at the control block.

The intrusive shared pointer can be modified to support for atomic
operations (there is a branch that adds this support). These atomic
operations can be used instead of the lock when changing inner node
pointers in the shamap.

Note: The space savings is independent from removing the locks from
shamap inner node. Therefor this work is divided into two phases. In the
first phase a non-atomic intrusive pointer is introduced and the locks
are kept. In a second phases the atomic intrusive pointer could be
introduced and the locks will be removed. Some of the code in this patch
is written with the upcoming atomic work in mind (for example, using
exchange in places). The atomic intrusive pointer also requires the C++
library to support `atomic_ref`. Both gcc and msvc support this, but at
the time of this writing clang's library does not.

Note: Intrusive pointer will be 12 bytes. The shared_ptr will be around
40 bytes, depending on implementation.

When measuring memory usage on a validator, this patch resulted in
between a 10 and 15% memory savings.
Copy link

codecov bot commented Oct 14, 2024

Codecov Report

Attention: Patch coverage is 85.39604% with 118 lines in your changes missing coverage. Please review.

Project coverage is 76.2%. Comparing base (bf4a7b6) to head (19464a0).

Files with missing lines Patch % Lines
include/xrpl/basics/IntrusivePointer.ipp 85.1% 40 Missing ⚠️
include/xrpl/basics/TaggedCache.ipp 85.2% 39 Missing ⚠️
include/xrpl/basics/SharedWeakCachePointer.ipp 75.9% 13 Missing ⚠️
include/xrpl/basics/IntrusiveRefCounts.h 91.0% 8 Missing ⚠️
src/xrpld/shamap/detail/SHAMapInnerNode.cpp 68.8% 5 Missing ⚠️
src/xrpld/shamap/detail/SHAMapTreeNode.cpp 62.5% 3 Missing ⚠️
src/xrpld/shamap/detail/TaggedPointer.ipp 80.0% 3 Missing ⚠️
src/xrpld/shamap/SHAMapTxPlusMetaLeafNode.h 0.0% 2 Missing ⚠️
src/xrpld/shamap/detail/SHAMap.cpp 96.2% 2 Missing ⚠️
include/xrpl/basics/IntrusivePointer.h 90.9% 1 Missing ⚠️
... and 2 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #5152     +/-   ##
=========================================
+ Coverage     76.2%   76.2%   +0.1%     
=========================================
  Files          760     765      +5     
  Lines        61568   62015    +447     
  Branches      8126    8149     +23     
=========================================
+ Hits         46898   47283    +385     
- Misses       14670   14732     +62     
Files with missing lines Coverage Δ
include/xrpl/basics/TaggedCache.h 100.0% <100.0%> (+14.3%) ⬆️
include/xrpl/protocol/AccountID.h 61.5% <ø> (ø)
src/xrpld/app/ledger/ConsensusTransSetSF.cpp 0.0% <ø> (ø)
src/xrpld/app/ledger/LedgerHistory.cpp 50.2% <ø> (ø)
src/xrpld/app/ledger/detail/LedgerMaster.cpp 41.6% <ø> (ø)
src/xrpld/app/ledger/detail/TransactionMaster.cpp 71.2% <ø> (ø)
src/xrpld/app/main/Application.cpp 67.9% <ø> (+0.1%) ⬆️
src/xrpld/app/main/Application.h 100.0% <ø> (ø)
src/xrpld/app/misc/NetworkOPs.cpp 68.3% <ø> (ø)
src/xrpld/app/misc/SHAMapStoreImp.h 96.3% <ø> (ø)
... and 26 more

... and 3 files with indirect coverage changes

Impacted file tree graph

@vlntb
Copy link
Collaborator Author

vlntb commented Nov 6, 2024

Analysis of Reference Count Ranges for Intrusive Smart Pointers

Background

Following the conversation in the original PR (Intrusive shamap inner (SHAMapTreeNode memory reduction) by seelabs · Pull Request #4815 · XRPLF/rippled ), it was raised that unlike the standard library shared_ptr and weak_ptr, the newly introduced intrusive versions have narrower ranges for storing reference counts. The proposed change sets ranges as:

  • For strong references: 65535
  • For weak references: 16383

Questions

  • The task is to do a code audit and prepare tests to check possible maximum reference number counts that can occur in the current version of rippled.
  • Decide if the proposed ranges are enough for the current version and the near future. It is possible to increase ranges in the future, while the move to intrusive smart pointers will still be beneficial.

Code audit

Strong references

From analyzing the code:
Theoretical Maximum = (shareChild calls) X (number of ledgers containing the same node)
where
shareChild calls - the shareChild calls during tree traversal (walkSubTree).
number of ledgers containing the same node - while generating a ledger, the same transaction might be added to several versions of the ledger until one of them gets accepted by consensus. Therefore, the same node may get referenced from multiple trees representing different ledger versions.

Worst-case scenario:

  • shareChild calls during tree traversal = 2
  • Given network of 35 validators
  • 5-second interval to reach a consensus
  • 15-second interval deadline before network reset

Theoretical Maximum value: 210 = 2 X (15 / 5) X 35

Weak references

Class WeakIntrusive is not used explicitly or implicitly at the moment. The only place where the weak pointer is used is in the conversion from strong to weak when sweeping the TaggedCache. This means that the number of weak reference counts can never be higher than the number of strong references.

Tests

Temporary code changes

Test runs

  • 12 rippled sessions ranging in duration from 1 hr to 24 hrs
  • Network: livenet
  • State: proposing

Test results

  • Maximum number of strong references: 387
  • Maximum number of weak references: 1

The test result of 387 references being observed is much higher than the theoretical maximum. This suggests:
There may be excessive copying of nodes during the initialization phase or traversal.
Certain caching mechanisms (like TaggedCache or similar) can have an effect on the reference count.

Conclusion

  • Strong Reference Count Range: The proposed limit of 65535 is more than adequate, and the logic for calculating the theoretical maximum (210) is sound. The observed discrepancy (387) highlights a need to investigate potential inefficiencies in node copying or caching.
  • Weak Reference Count Range: The proposed limit of 16383 is also sufficient, and the observed maximum (1) confirms that weak references are minimal under current usage patterns.
  • Actionable Insight: The excessive copying or caching logic leading to 387 references warrants further investigation to improve efficiency.

Copy link
Contributor

@HowardHinnant HowardHinnant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theoretical maximal value for string references is calculated to be 210. Experimental evidence also from the readme detects a value above the theoretical maximum: 387.
I ran a server for about an hour and detected a max of 1908.

These are all well below the limits of 65535, so this limit is probably safe. But it wouldn't hurt to revisit the theoretical maximum and discover why it is incorrect.

@vlntb
Copy link
Collaborator Author

vlntb commented Nov 13, 2024

The theoretical maximal value for string references is calculated to be 210. Experimental evidence also from the readme detects a value above the theoretical maximum: 387. I ran a server for about an hour and detected a max of 1908.

These are all well below the limits of 65535, so this limit is probably safe. But it wouldn't hurt to revisit the theoretical maximum and discover why it is incorrect.

I did additional digging following a comment from @HowardHinnant. What I didn't take into account is that rippled is processing transaction or ledger data in a concurrent environment. I identified four types of routines that can happen in parallel:

  • InboundLedgersImp::gotLedgerData
  • SHAMap::walkTowardsKey
  • SHAMap::flushDirty 
  • LedgerMaster::gotFetchPack

Two of those routines are executed from the JobQueue and can be parallized further based on the node_size configuration parameter. The difference in this parameter explains the difference in the result that Howard and I received. Howard had his node_size set as huge, resulting in 8 threads in the JobQueue pool. In my setup, I had it defined as medium, resulting in 4 threads.

Based on those findings, we should update the Theoretical Maximum value.

  1. Theoretical Maximum value for a single thread: 210 = 2 X (15 / 5) X 35.
  2. InboundLedgersImp::gotLedgerData can be executed across a maximum of 8 threads.
  3. SHAMap::walkTowardsKey - 1 thread
  4. SHAMap::flushDirty - 1 thread
  5. LedgerMaster::gotFetchPack - can be executed across a maximum of 8 threads.

Giving overall Theoretical Maximum value as:
8 x 210 + 210 + 210 + 8 x 210 = 3780.

This is still significantly lower than the allocated 65535 range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants