Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST CI] Run IPC tests with the proxy library #868

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ldorau
Copy link
Contributor

@ldorau ldorau commented Nov 5, 2024

Description

[TEST CI] Run IPC tests with the proxy library.

Ref: #864

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly

@ldorau ldorau force-pushed the Run_IPC_tests_with_the_proxy_library branch 10 times, most recently from b26565d to 9c8c488 Compare November 12, 2024 14:14
@ldorau ldorau force-pushed the Run_IPC_tests_with_the_proxy_library branch 5 times, most recently from 5842644 to 0ed63c4 Compare November 14, 2024 11:51
@ldorau ldorau changed the title [TEST CI] Run IPC tests with the proxy library Fix trackingFree() for providers not supporting free() op and enable all IPC tests Nov 14, 2024
@ldorau ldorau force-pushed the Run_IPC_tests_with_the_proxy_library branch from 0ed63c4 to 057c819 Compare November 14, 2024 12:00
@ldorau ldorau marked this pull request as ready for review November 14, 2024 12:01
@ldorau ldorau requested a review from a team as a code owner November 14, 2024 12:01
@ldorau
Copy link
Contributor Author

ldorau commented Nov 14, 2024

@vinser52 please review

@@ -382,6 +382,12 @@ static umf_result_t trackingFree(void *hProvider, void *ptr, size_t size) {
ret = umfMemoryProviderFree(p->hUpstream, ptr, size);
if (ret != UMF_RESULT_SUCCESS) {
LOG_ERR("upstream provider failed to free the memory");
// Do not add memory back to the tracker,
// if the provider does not support the free() op.
if (ret == UMF_RESULT_ERROR_NOT_SUPPORTED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should not add memory back to the tracker in case of memory provider does not support free.

But why is pool implementation tries to call umfMemoryProviderFree() at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean umfMemoryProviderFree() above ? - inside trackingFree() in line no. 382 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With such fix we get a memory leak.

Imagine the pool calls umfMemoryProviderFree() for the address range [a. b). After that pool calls umfMemoryProviderAlloc() my understanding is that memory provider will never return the range [a. b) as a result of malloc, but it will return the next chunk [b,c). If the pool will call umfMemoryProviderFree() for the [b,c) range it also will be lost and never be used again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the file provider and the devdax provider work exactly as as you wrote - they were designed to work in this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ldorau - this is how these providers work. And because of this, in the real-life scenarios they should be used either with a pool that never calls free() or with CoarseProvider

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but we should not just return "success" here because the memory is actually not released. Maybe this fix is wrong... We should fix only the case when the pool that uses devdax/file providers is destroyed. Only then we should remove the entries from the tracker as we also unmap the memory.

There is also an additional way fix this bug - free() in devdax/file could do the unmap but we could keep the offset unchanged - this way the free() would be supported and entries would be correctly removed from the tracker while we still keep file/devdax provider logic simple (so the memory would be still unavailable after free).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also an additional way fix this bug - free() in devdax/file could do the unmap but we could keep the offset unchanged - this way the free() would be supported and entries would be correctly removed from the tracker while we still keep file/devdax provider logic simple (so the memory would be still unavailable after free).

I agree that at least we are not wasting VA space (but it is not a problem), but since we never decrement the offset in devdax and file providers by design we still have a kind of memory leak. Because if the umfMemoryProviderAlloc and umfMemoryProviderFree are called multiple times, after some time umfMemoryProviderAlloc will return OOM because offset reached the end of file/devdax.

We discussed this with @ldorau offline. My main comment that currently this PR eliminates the issue:

  • today we have a memory leak if umfMemoryProviderFree is called. If we do not remove the range from the tracker our diagnostic in DEBUG mode will detect it and report.
  • with this PR the memory leak still there but our diagnostic is not able to detect it and report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ldorau - this is how these providers work. And because of this, in the real-life scenarios they should be used either with a pool that never calls free() or with CoarseProvider

Just to clarify, I am not challenging how devdax or file providers work. we agreed on that. My concern is about this particular flow when some pool calls umfmemoryProviderFree for devdax or file providers. This situation should not happens or should be handled correctly. Today looks like there is a memory leak in that case and this PR does not fix the leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed (code change). It will be fixed in another way.
Done.

@@ -382,6 +382,12 @@ static umf_result_t trackingFree(void *hProvider, void *ptr, size_t size) {
ret = umfMemoryProviderFree(p->hUpstream, ptr, size);
if (ret != UMF_RESULT_SUCCESS) {
LOG_ERR("upstream provider failed to free the memory");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case the free() is not supported we should call LOG_ERR. Please do not nest ifs here and use switch or if/else if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed (code change). It will be fixed in another way.
Done.

@ldorau ldorau force-pushed the Run_IPC_tests_with_the_proxy_library branch from 057c819 to 2058774 Compare November 14, 2024 21:33
@ldorau ldorau marked this pull request as draft November 14, 2024 21:34
@ldorau ldorau changed the title Fix trackingFree() for providers not supporting free() op and enable all IPC tests [TEST CI] Run IPC tests with the proxy library Nov 14, 2024
Signed-off-by: Lukasz Dorau <[email protected]>
Remove free_not_supp from the ipcTestParams tuple.
It is not needed any more.

Signed-off-by: Lukasz Dorau <[email protected]>
Fix two error messages and add some new ones.

Signed-off-by: Lukasz Dorau <[email protected]>
@ldorau ldorau force-pushed the Run_IPC_tests_with_the_proxy_library branch from 2058774 to efef7d0 Compare November 14, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants