Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution of Genquery leads to increased memory allocation #216

Open
tsmeele opened this issue Aug 8, 2024 · 8 comments
Open

Execution of Genquery leads to increased memory allocation #216

tsmeele opened this issue Aug 8, 2024 · 8 comments

Comments

@tsmeele
Copy link

tsmeele commented Aug 8, 2024

Environment: iRODS 4.2.12
Genquery appears to significantly leak memory when the query result includes many rows.

Code to reproduce the issue:

#!/usr/bin/irule -r irods_rule_engine_plugin-python-instance -F

import genquery
import os
import psutil

def report_usage(process, callback):
    stat = "Memory usage: " + str(process.memory_info().rss / 1024**2)
    callback.writeLine("stdout", stat)

def coll_query(callback):
    coll_iter = genquery.row_iterator(
    "COLL_ID",
    "",
    genquery.AS_LIST,
    callback)

    collcount = 0
    for row in coll_iter:
        collcount = collcount + 1

    return collcount

def main(rule_args, callback, rei):
    process = psutil.Process(os.getpid())
    report_usage(process, callback)
    coll_query(callback)
    report_usage(process, callback)
    coll_query(callback)
    report_usage(process, callback)
    count = coll_query(callback)
    report_usage(process, callback)
    callback.writeLine("stdout","count of collections = {}".format(count))

INPUT *inp="null"
OUTPUT ruleExecOut

Example output:

$ irule -r irods_rule_engine_plugin-python-instance -F python_rule.r
Memory usage: 79
Memory usage: 126
Memory usage: 166
Memory usage: 200
count of collections = 348737
$
@alanking
Copy link
Contributor

alanking commented Aug 8, 2024

Marking as bug due to the possibility of the GenQuery iterator leaking memory. We will investigate this.

@tsmeele
Copy link
Author

tsmeele commented Aug 13, 2024

Further analysis (tested with iRODS 4.3.2 server) reveals that the issue stems from a deeper cause, unrelated to the iterator. An interaction from Python with certain msi suffices, as demonstrated by below code. In contrast, calls to msiSplitPath apparently do not leak.

While the impact at first sight may seem limited, our experience is that to process some 60,000 data objects using a combination of a few queries and msi's in sequence for each data object can make an iRODS agent process already allocate over 32 GB memory.

 irule -r irods_rule_engine_plugin-python-instance -F py.r
64172032  bytes memory allocated
run msiMakeQuery (50 runs)
65753088  bytes memory allocated
run msiExecGenQuery (50 runs)
67076096  bytes memory allocated
run msiSplitpath (500000 runs)
67076096  bytes memory allocated

Source code:

ton@irods:~$ cat py.r
#!/usr/bin/irule -r irods_rule_engine_plugin-python-instance -F

import os
import psutil
import irods_types

def mem(callback):
    process = psutil.Process(os.getpid())
    callback.writeLine("stdout","{}  bytes memory allocated".format(str(process.memory_info().rss)))

def repeat(callback, runs, function, parms):
    for i in range(runs):
        if parms == None:
            function(callback)
        else:
            function(callback, parms)

def test(callback, runs, title, function, parms=None):
    callback.writeLine("stdout", "run {} ({} runs)".format(title, str(runs)))
    repeat(callback, runs, function, parms)
    mem(callback)



def makeQuery(callback):
    ret = callback.msiMakeGenQuery("COUNT(USER_ID)","",irods_types.GenQueryInp())
    return ret['arguments'][2]

def execQuery(callback, query):
    # NB: Must use a query that is guaranteed to return less than 256 rows
    #     so that the query will close automatically
    ret = callback.msiExecGenQuery(query, irods_types.GenQueryOut())


def splitPath(callback):
    ret = callback.msiSplitPath("collection/dataobject",0,0)

def main(rule_args, callback, rei):
    mem(callback)
    test(callback, 50, "msiMakeQuery", makeQuery)
    query = makeQuery(callback)
    test(callback, 50, "msiExecGenQuery", execQuery, query)
    # show that splitPath does not leak memory
    test(callback, 500000, "msiSplitpath", splitPath)

INPUT *inp="null"
OUTPUT ruleExecOut

@tsmeele
Copy link
Author

tsmeele commented Aug 13, 2024

Turns out the issue also occurs, to a lesser degree, when we use the iRODS rule language engine. Using this rule engine we can demonstrate impact at 500 runs of msiMakeGenQuery. There is no visible impact when we limit the number of runs to 50.

ton@irods:~$ irule -r irods_rule_engine_plugin-irods_rule_language-instance -F rule.r
Memory usage: 64176128 bytes allocated (at start)
Memory usage: 64176128 bytes allocated (after 500,000x memusage)
Memory usage: 65998848 bytes allocated (after 500x makequery)

Source code:

ton@irods:~$ cat rule.r
#!/usr/bin/irule -r irods_rule_engine_plugin-irods_rule_language-instance -F

myRule() {
   *m = ""; 
   # memusage is a Python rule that returns process mem size 
   memusage(*m);
   writeLine("stdout", "*m bytes allocated (at start)");

   # show that call to memusage does not leak 
   for (*i = 0; *i < 500000; *i = *i + 1) {
      memusage(*m);
   }
   writeLine("stdout","*m bytes allocated (after 500,000x memusage)");

   # test if call to msiMakeGenQuery leaks
   for (*i = 0; *i < 500; *i = *i + 1) {
       msiMakeGenQuery("COUNT(USER_ID)", "", *genQueryInp);	   
   }
   memusage(*m);
   writeLine("stdout","*m bytes allocated (after 500x makequery)");
}

INPUT *inp="aa"
OUTPUT ruleExecOut

Python rule that measures memory allocated by agent process:

import os
import psutil

def memusage(rule_args, callback, rei):
    process = psutil.Process(os.getpid())
    stat = "Memory usage: " + str(process.memory_info().rss)
    rule_args[0] = stat

@trel
Copy link
Member

trel commented Aug 13, 2024

Very nice.

Okay, so we're eyeballing msiMakeGenQuery and msiExecGenQuery directly now.

Upon investigation, we may move this issue to irods/irods.

@korydraughn
Copy link
Contributor

msiMakeGenQuery and msiMakeQuery will always leak unless the handles to the query objects are deallocated.

One thing I noticed while running the code at #216 (comment) - If I set the upper limit for the msiMakeGenQuery loop to 1 and then gradually increase it to 465, the memory usage remains the same. However, once the loop counter exceeds 465, the memory usage increases.

Very weird. It's not clear to me why that happens.

@tsmeele Perhaps you can reproduce what I'm seeing on your computer.

I also tried replacing the GenQuery1 MSIs with GenQuery2 MSIs. Memory usage increases with those MSIs too, yet they do not contain memory leaks AFAIK.

*h = '';
msi_genquery2_execute(*h, 'select count(USER_ID)');
msi_genquery2_free(*h);

Asan and valgrind should help reveal what's happening.

Aside from that, looks like we don't provide dedicated MSIs for closing GenQueryInput/Output objects separately.

@tsmeele
Copy link
Author

tsmeele commented Aug 14, 2024

@korydraughn Indeed we get similar results. The choppy memory allocation is most likely explained by buffering done internally by the malloc() algorithm.
AFAIK msiMakeGenQuery does not actually create a query handle or perform any rsXXX calls, it merely populates and returns a data structure?

@tsmeele
Copy link
Author

tsmeele commented Aug 14, 2024

Some further experimentation, now using a MyRods client. This client is able to issue rcXXX API requests directly, bypassing the rule engine msi code.

Reassuring result is that, even after 12000 rsGenQuery requests, the agent memory allocation has not increased.

NB: In below output, rcExecMyRule runs have been added as a contrasting alternative case, known to impact agent memory a little.

MyRods started
Native auth successful
Memory usage: 65093632
Executing rcGenQuery "SELECT COUNT(DATA_ID)" 12000 runs
104 seconds have elapsed
Memory usage: 65093632
Executing rcExecMyRule 12000 runs
128 seconds have elapsed
Memory usage: 67055616

The rsGenQuery result using rcXXX API calls contrasts with results obtained with requests issued via microservices within a rule engine environment. The rule engine based results had shown memory to increase aready after 50 runs (see earlier in this issue).

The new experiment narrows our search to the implementation of the Genquery related microservices in the native rule language engine. And possibly on top of that also parameter passing back and forth in related callback requests issued by the Python rule engine.

@korydraughn
Copy link
Contributor

The choppy memory allocation is most likely explained by buffering done internally by the malloc() algorithm.

That's very possible.

AFAIK msiMakeGenQuery does not actually create a query handle or perform any rsXXX calls, it merely populates and returns a data structure?

Correct. It does not execute the query. It makes at least three memory allocations, one being for the genQueryInp_t.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants