Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Tree Count to 50 & Update Memory Calculation for RCF models #1181

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

kaituo
Copy link
Collaborator

@kaituo kaituo commented Mar 30, 2024

Description

This PR accomplishes two main objectives:

  • Tree Count Enhancement: Increases the number of trees to 50, aiming to improve RCF models' accuracy and robustness.
  • Memory Calculation Update: Adjusts the memory calculation formula due to upgrades in the RCF library and changes in parameters.

Methodology for Memory Calculation:
The updated memory calculation follows the methodology in PR #222. It involves running TRCF or RCFCaster with one million data points and measuring object sizes using jmap memory dumps. This white-box approach examines all fields listed under the heap dump to account for various scenarios, such as fluctuations in node store size due to parameter changes. Additionally, the point store ratio adjustment is based on shingle size, using a heuristic constant derived from empirical data.

Experimental Validation:
The memory size formula's accuracy was validated through experiments, with results showing a close match between estimated and actual memory usage, within a tolerable variance. Detailed experiment data can be found here:

Testing done:

  • added new test cases.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kaituo Li <[email protected]>
Copy link

codecov bot commented Mar 30, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 80.53%. Comparing base (3034784) to head (007a5b8).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1181      +/-   ##
============================================
- Coverage     80.54%   80.53%   -0.01%     
- Complexity     4599     4601       +2     
============================================
  Files           336      336              
  Lines         19091    19119      +28     
  Branches       1987     1993       +6     
============================================
+ Hits          15377    15398      +21     
- Misses         2769     2772       +3     
- Partials        945      949       +4     
Flag Coverage Δ
plugin 80.53% <88.88%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../java/org/opensearch/ad/caching/PriorityCache.java 68.54% <ø> (ø)
...ava/org/opensearch/ad/task/ADTaskCacheManager.java 89.56% <100.00%> (+0.02%) ⬆️
...search/timeseries/settings/TimeSeriesSettings.java 94.73% <100.00%> (ø)
.../java/org/opensearch/timeseries/MemoryTracker.java 79.83% <88.09%> (+0.48%) ⬆️

... and 6 files with indirect coverage changes

@kaituo kaituo changed the title Parameter tuning Increase Tree Count to 50 & Update Memory Calculation for RCF models Apr 1, 2024
* numberOfTrees * dimension * 4 * averagePointStoreUsage + 77192);
long thresholdSize = 6 * (dimension * 8 + 16) + shingleSize * 8 + 624;
return compactRcfSize + thresholdSize;
int pointStoreCapacity = Math.max(sampleSize * numberOfTrees + 1, 2 * sampleSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sampleSize * numberOfTrees are calculated multiple times, maybe consider storing this in a variable if it doesn't change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

jackiehanyang
jackiehanyang previously approved these changes Apr 3, 2024
@kaituo kaituo merged commit e9a3782 into opensearch-project:main Apr 3, 2024
25 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 3, 2024
…1181)

* Parameter tuning

Signed-off-by: Kaituo Li <[email protected]>

* reuse capacity calculation

Signed-off-by: Kaituo Li <[email protected]>

---------

Signed-off-by: Kaituo Li <[email protected]>
(cherry picked from commit e9a3782)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
kaituo pushed a commit that referenced this pull request Apr 5, 2024
…1181) (#1186)

* Parameter tuning



* reuse capacity calculation



---------


(cherry picked from commit e9a3782)

Signed-off-by: Kaituo Li <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants