Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] add more memory usage profiles for spill (#24353) #26371

Merged
merged 11 commits into from
Jul 4, 2023

Conversation

silverbullet233
Copy link
Contributor

@silverbullet233 silverbullet233 commented Jul 3, 2023

backport #24353, #24644, #24831, #25657, #25191, #25234, #25577, #24148, #25853, #21834

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.1
    • 3.0
    • 2.5
    • 2.4

silverbullet233 and others added 11 commits July 3, 2023 08:35
1. add more metrics related to memory usage in query profile to guide
subsequent optimization of the spill strategy

2. move the spiller-related metrics to the same parent node, and delete
the useless prefix.

Signed-off-by: silverbullet233 <[email protected]>
A concept of reserve memory is provided.
Before executing push_chunk/set_finishing, we need to reserve memory in
advance for operators that requests a large amount of memory. If the
reserve fails, the operator will try to spill.

Signed-off-by: stdpain <[email protected]>
…tarRocks#25657)

## BackGround
when I test auto spill mode under tpch-1T dataset in a 16c64g instance,
I found Q18 will cancel due to OOM, and the consumption values of some
mem tracker are wrong after the query cancels:

1.  the consumption of chunk_allocator will be a negative number 

2.  the consumption of process mem tracker will be a very large number

these issues were introduced by
StarRocks#24644

### for the first issue

After the reserved_bytes is introduced, the mem tracker will be updated
in advance, and the unused reserved memory will be returned after the
push_chunk/set_finishing call ends. So when push_chunk/set_finishing is
executed, if the allocated memory size does not exceed reserved bytes in
current thread, then the mem tracker will not be updated.

If the tls_mem_tracker doesn't switch during execution, everything is
ok. Otherwise there will be problems, take MemChunkAllocator as an
example:

![image](https://github.com/StarRocks/starrocks/assets/3675229/1b7de188-7958-48f4-99c9-1beafb2fdfcb)


consider the following operations:
1. first, reserving memory in PipelineDriver::_adjust_memory_usage,
tls_mem_tracker is fragment instance tracker, if we reserve 10 bytes,
fragment->query->wg->query_pool->process will be updated.
4. then executing MemChunkAllocator::allocate, tls_mem_tracker will
switch to chunk_allocator, if we allocate 5 bytes, chunk_allocator
tracker won't be really update because of the existence of
reserved_bytes in current thread
5. in the defer op, because of the existence of reserve bytes, we cannot
know how much the consumption of chunk_allocator tracker has been
updated. At this time, we still release 5 bytes, which is wrong.

The solution is to release the reserved memory every time the tracker is
switched.


### for the second issue


![image](https://github.com/StarRocks/starrocks/assets/3675229/ac4f77bc-61b4-49d4-a1fd-4defa3733198)

the type of request_reserved is wrong. auto type means int32_t, we
should use int64_t instead.
otherwise, if the reserved bytes larger than int32, something will be
wrong.

Signed-off-by: silverbullet233 <[email protected]>
…5191)

We try to reserve a large amount of memory (1G per operator) when
restoring the partition hash table. In this PR, we modify the memory
table size to memory_table_num * memory_table_bytes_size. If each
partition is currently small, but the overall memory table exceeds the
memory_table_bytes_size, we will flush half of the memory table to
ensure that the memory table is smaller than the expected size.

Signed-off-by: stdpain <[email protected]>
…more accurate (StarRocks#25234)

In the past, we have taken a more conservative strategy. We have
considered the hashmap/hash_set to have the possibility of expansion
every time. Now we only need to reserve memory in advance when the phmap
really needs to be expanded

Signed-off-by: stdpain <[email protected]>
introduce by StarRocks#25234
capacity 0 will cause a assert error
```
*** Aborted at 1687172299 (unix time) try "date -d @1687172299" if you are using GNU date ***
starrocks_be: /root/starrocks/be/src/util/phmap/phmap.h:493: size_t phmap::priv::CapacityToGrowth(s
ize_t): Assertion `IsValidCapacity(capacity)' failed.
PC: @     0x7fef66b85a7c pthread_kill
*** SIGABRT (@0x3eb001c2fd0) received by PID 1847248 (TID 0x7fee45771640) from PID 1847248; stack t
race: ***
    @         0x1493b06a google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fef66b31520 (unknown)
    @     0x7fef66b85a7c pthread_kill
    @     0x7fef66b31476 raise
    @     0x7fef66b177f3 abort
    @     0x7fef66b1771b (unknown)
    @     0x7fef66b28e96 __assert_fail
    @          0x9885764 phmap::priv::CapacityToGrowth()
    @          0xb905075 starrocks::AggHashMapVariant::need_expand()
    @          0xb05ade3 starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_auto()
    @          0xb0565c7 starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
    @          0x9dca867 starrocks::pipeline::PipelineDriver::process()
    @         0x135ecf73 starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @         0x135eb3fa _ZZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiENKUlvE2_clEv
    @         0x135f58fa _ZSt13__invoke_implIvRZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiEUlvE2_JEET_St14__invoke_otherOT0_DpOT1_
    @         0x135f4d55 _ZSt10__invoke_rIvRZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiEUlvE2_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES6_E4typeEOS7_DpOS8_
    @         0x135f3fba _ZNSt17_Function_handlerIFvvEZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiEUlvE2_E9_M_invokeERKSt9_Any_data
    @          0x99b783a std::function<>::operator()()
    @         0x121717f8 starrocks::FunctionRunnable::run()
    @         0x1216e02c starrocks::ThreadPool::dispatch_thread()
    @         0x1218c818 std::__invoke_impl<>()
    @         0x1218c0d5 std::__invoke<>()
    @         0x1218b3d0 _ZNSt5_BindIFMN9starrocks10ThreadPoolEFvvEPS1_EE6__callIvJEJLm0EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
    @         0x12189a3f std::_Bind<>::operator()<>()
    @         0x121867ec std::__invoke_impl<>()
    @         0x121838a4 _ZSt10__invoke_rIvRSt5_BindIFMN9starrocks10ThreadPoolEFvvEPS2_EEJEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESA_E4typeEOSB_DpOSC_
    @         0x1217f413 std::_Function_handler<>::_M_invoke()
    @          0x99b783a std::function<>::operator()()
    @         0x12153d16 starrocks::Thread::supervise_thread()
    @     0x7fef66b83b43 (unknown)
    @     0x7fef66c15a00 (unknown)
    @                0x0 (unknown)
```
… times (StarRocks#24148)

The root cause is that there is a problem with the maintenance of
min_level. When a partition is continuously split, the middle level may
be empty, but the upper and lower levels have partitions, here is an
example:

1. at first, all partitions in level 2
level 2: [4,5,6,7]
2. then partition  4 split to 8 and 12
level 2: [5,6,7]
level 3: [8,12]
3. then partition 8 split to 16 and 24, partition 12 split to 20 and 28
level 2: [5, 6, 7]
level 3: []
level 4: [16, 20, 24, 28]

so we can only move forward min_level when there is no partition in this
level

Signed-off-by: silverbullet233 <[email protected]>
@sonarcloud
Copy link

sonarcloud bot commented Jul 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@stdpain stdpain enabled auto-merge (rebase) July 4, 2023 03:50
@stdpain stdpain merged commit d8d66fe into StarRocks:branch-3.0 Jul 4, 2023
34 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants