-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] add more memory usage profiles for spill (#24353) #26371
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. add more metrics related to memory usage in query profile to guide subsequent optimization of the spill strategy 2. move the spiller-related metrics to the same parent node, and delete the useless prefix. Signed-off-by: silverbullet233 <[email protected]>
A concept of reserve memory is provided. Before executing push_chunk/set_finishing, we need to reserve memory in advance for operators that requests a large amount of memory. If the reserve fails, the operator will try to spill. Signed-off-by: stdpain <[email protected]>
…sh join (StarRocks#24831) Signed-off-by: stdpain <[email protected]>
…tarRocks#25657) ## BackGround when I test auto spill mode under tpch-1T dataset in a 16c64g instance, I found Q18 will cancel due to OOM, and the consumption values of some mem tracker are wrong after the query cancels: 1. the consumption of chunk_allocator will be a negative number 2. the consumption of process mem tracker will be a very large number these issues were introduced by StarRocks#24644 ### for the first issue After the reserved_bytes is introduced, the mem tracker will be updated in advance, and the unused reserved memory will be returned after the push_chunk/set_finishing call ends. So when push_chunk/set_finishing is executed, if the allocated memory size does not exceed reserved bytes in current thread, then the mem tracker will not be updated. If the tls_mem_tracker doesn't switch during execution, everything is ok. Otherwise there will be problems, take MemChunkAllocator as an example: ![image](https://github.com/StarRocks/starrocks/assets/3675229/1b7de188-7958-48f4-99c9-1beafb2fdfcb) consider the following operations: 1. first, reserving memory in PipelineDriver::_adjust_memory_usage, tls_mem_tracker is fragment instance tracker, if we reserve 10 bytes, fragment->query->wg->query_pool->process will be updated. 4. then executing MemChunkAllocator::allocate, tls_mem_tracker will switch to chunk_allocator, if we allocate 5 bytes, chunk_allocator tracker won't be really update because of the existence of reserved_bytes in current thread 5. in the defer op, because of the existence of reserve bytes, we cannot know how much the consumption of chunk_allocator tracker has been updated. At this time, we still release 5 bytes, which is wrong. The solution is to release the reserved memory every time the tracker is switched. ### for the second issue ![image](https://github.com/StarRocks/starrocks/assets/3675229/ac4f77bc-61b4-49d4-a1fd-4defa3733198) the type of request_reserved is wrong. auto type means int32_t, we should use int64_t instead. otherwise, if the reserved bytes larger than int32, something will be wrong. Signed-off-by: silverbullet233 <[email protected]>
…5191) We try to reserve a large amount of memory (1G per operator) when restoring the partition hash table. In this PR, we modify the memory table size to memory_table_num * memory_table_bytes_size. If each partition is currently small, but the overall memory table exceeds the memory_table_bytes_size, we will flush half of the memory table to ensure that the memory table is smaller than the expected size. Signed-off-by: stdpain <[email protected]>
…more accurate (StarRocks#25234) In the past, we have taken a more conservative strategy. We have considered the hashmap/hash_set to have the possibility of expansion every time. Now we only need to reserve memory in advance when the phmap really needs to be expanded Signed-off-by: stdpain <[email protected]>
introduce by StarRocks#25234 capacity 0 will cause a assert error ``` *** Aborted at 1687172299 (unix time) try "date -d @1687172299" if you are using GNU date *** starrocks_be: /root/starrocks/be/src/util/phmap/phmap.h:493: size_t phmap::priv::CapacityToGrowth(s ize_t): Assertion `IsValidCapacity(capacity)' failed. PC: @ 0x7fef66b85a7c pthread_kill *** SIGABRT (@0x3eb001c2fd0) received by PID 1847248 (TID 0x7fee45771640) from PID 1847248; stack t race: *** @ 0x1493b06a google::(anonymous namespace)::FailureSignalHandler() @ 0x7fef66b31520 (unknown) @ 0x7fef66b85a7c pthread_kill @ 0x7fef66b31476 raise @ 0x7fef66b177f3 abort @ 0x7fef66b1771b (unknown) @ 0x7fef66b28e96 __assert_fail @ 0x9885764 phmap::priv::CapacityToGrowth() @ 0xb905075 starrocks::AggHashMapVariant::need_expand() @ 0xb05ade3 starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_auto() @ 0xb0565c7 starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk() @ 0x9dca867 starrocks::pipeline::PipelineDriver::process() @ 0x135ecf73 starrocks::pipeline::GlobalDriverExecutor::_worker_thread() @ 0x135eb3fa _ZZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiENKUlvE2_clEv @ 0x135f58fa _ZSt13__invoke_implIvRZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiEUlvE2_JEET_St14__invoke_otherOT0_DpOT1_ @ 0x135f4d55 _ZSt10__invoke_rIvRZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiEUlvE2_JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES6_E4typeEOS7_DpOS8_ @ 0x135f3fba _ZNSt17_Function_handlerIFvvEZN9starrocks8pipeline20GlobalDriverExecutor10initializeEiEUlvE2_E9_M_invokeERKSt9_Any_data @ 0x99b783a std::function<>::operator()() @ 0x121717f8 starrocks::FunctionRunnable::run() @ 0x1216e02c starrocks::ThreadPool::dispatch_thread() @ 0x1218c818 std::__invoke_impl<>() @ 0x1218c0d5 std::__invoke<>() @ 0x1218b3d0 _ZNSt5_BindIFMN9starrocks10ThreadPoolEFvvEPS1_EE6__callIvJEJLm0EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x12189a3f std::_Bind<>::operator()<>() @ 0x121867ec std::__invoke_impl<>() @ 0x121838a4 _ZSt10__invoke_rIvRSt5_BindIFMN9starrocks10ThreadPoolEFvvEPS2_EEJEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESA_E4typeEOSB_DpOSC_ @ 0x1217f413 std::_Function_handler<>::_M_invoke() @ 0x99b783a std::function<>::operator()() @ 0x12153d16 starrocks::Thread::supervise_thread() @ 0x7fef66b83b43 (unknown) @ 0x7fef66c15a00 (unknown) @ 0x0 (unknown) ```
…t-circuit (StarRocks#25694) Signed-off-by: stdpain <[email protected]>
… times (StarRocks#24148) The root cause is that there is a problem with the maintenance of min_level. When a partition is continuously split, the middle level may be empty, but the upper and lower levels have partitions, here is an example: 1. at first, all partitions in level 2 level 2: [4,5,6,7] 2. then partition 4 split to 8 and 12 level 2: [5,6,7] level 3: [8,12] 3. then partition 8 split to 16 and 24, partition 12 split to 20 and 28 level 2: [5, 6, 7] level 3: [] level 4: [16, 20, 24, 28] so we can only move forward min_level when there is no partition in this level Signed-off-by: silverbullet233 <[email protected]>
…tor (StarRocks#25853) Fixes StarRocks#25868 Signed-off-by: silverbullet233 <[email protected]>
…#21834) Signed-off-by: stdpain <[email protected]>
Kudos, SonarCloud Quality Gate passed! |
stdpain
approved these changes
Jul 4, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
backport #24353, #24644, #24831, #25657, #25191, #25234, #25577, #24148, #25853, #21834
What type of PR is this:
Checklist:
Bugfix cherry-pick branch check: