-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution of Genquery leads to increased memory allocation #216
Comments
Marking as bug due to the possibility of the GenQuery iterator leaking memory. We will investigate this. |
Further analysis (tested with iRODS 4.3.2 server) reveals that the issue stems from a deeper cause, unrelated to the iterator. An interaction from Python with certain msi suffices, as demonstrated by below code. In contrast, calls to msiSplitPath apparently do not leak. While the impact at first sight may seem limited, our experience is that to process some 60,000 data objects using a combination of a few queries and msi's in sequence for each data object can make an iRODS agent process already allocate over 32 GB memory.
Source code:
|
Turns out the issue also occurs, to a lesser degree, when we use the iRODS rule language engine. Using this rule engine we can demonstrate impact at 500 runs of msiMakeGenQuery. There is no visible impact when we limit the number of runs to 50.
Source code:
Python rule that measures memory allocated by agent process:
|
Very nice. Okay, so we're eyeballing Upon investigation, we may move this issue to irods/irods. |
One thing I noticed while running the code at #216 (comment) - If I set the upper limit for the Very weird. It's not clear to me why that happens. @tsmeele Perhaps you can reproduce what I'm seeing on your computer. I also tried replacing the GenQuery1 MSIs with GenQuery2 MSIs. Memory usage increases with those MSIs too, yet they do not contain memory leaks AFAIK. *h = '';
msi_genquery2_execute(*h, 'select count(USER_ID)');
msi_genquery2_free(*h); Asan and valgrind should help reveal what's happening. Aside from that, looks like we don't provide dedicated MSIs for closing GenQueryInput/Output objects separately. |
@korydraughn Indeed we get similar results. The choppy memory allocation is most likely explained by buffering done internally by the malloc() algorithm. |
Some further experimentation, now using a MyRods client. This client is able to issue rcXXX API requests directly, bypassing the rule engine msi code. Reassuring result is that, even after 12000 rsGenQuery requests, the agent memory allocation has not increased. NB: In below output, rcExecMyRule runs have been added as a contrasting alternative case, known to impact agent memory a little.
The rsGenQuery result using rcXXX API calls contrasts with results obtained with requests issued via microservices within a rule engine environment. The rule engine based results had shown memory to increase aready after 50 runs (see earlier in this issue). The new experiment narrows our search to the implementation of the Genquery related microservices in the native rule language engine. And possibly on top of that also parameter passing back and forth in related callback requests issued by the Python rule engine. |
That's very possible.
Correct. It does not execute the query. It makes at least three memory allocations, one being for the genQueryInp_t. |
Environment: iRODS 4.2.12
Genquery appears to significantly leak memory when the query result includes many rows.
Code to reproduce the issue:
Example output:
The text was updated successfully, but these errors were encountered: