Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly and Annotation pipeline: "/tmp" folder full issue with shovill #1491

Open
ajkarloss opened this issue May 22, 2023 · 5 comments
Open

Comments

@ajkarloss
Copy link

Note: Not able to find the repository for Assembly and Annotation pipeline. So, I am creating this issue here.

Even though we mention the location for the temp directory (--tmp-dir) for spades in "Shovill advanced options for spades" while running Assembly and Annotation Pipeline, we noticed that Spades is using "/tmp" and it gets filled up very quickly (We have assigned about 20GB for "/tmp" in our set up).

Any tips on how to fix it?

@georgemarselis-nvi
Copy link

to give more details:

image

spades receives the second parameters, as defined by Jeevan, but completely ignores the second set of values, keeps the first ones it receives:

[spades] Command line: /opt/galaxy/21.09/database/dependencies/_conda/envs/[email protected]/bin/spades.py --pe1-1 /opt/galaxy/21.09/database/jobs_directory/052/52207/working/out/flash.notCombined_1.fastq.gz --pe1-2 /opt/galaxy/21.09/database/jobs_directory/052/52207/working/out/flash.notCombined_2.fastq.gz --only-assembler --threads 1 --memory 4 -o /opt/galaxy/21.09/database/jobs_directory/052/52207/working/out/spades --tmp-dir /tmp -k 31,55,79,103,127 --pe1-m /opt/galaxy/21.09/database/jobs_directory/052/52207/working/out/flash.extendedFrags.fastq -- tmp-dir /opt/galaxy/tmp --threads 12 --memory 64

I am not sure who is responsible for appending the variables this way: is it the pipeline, when installed via irida or is it something the galaxy devs need to look at? or is it something we misconfigured?

for the record, the spades repo is here;
https://github.com/ablab/spades

@apetkau apetkau transferred this issue from phac-nml/irida-pipeline-plugins May 23, 2023
@apetkau
Copy link
Member

apetkau commented May 23, 2023

Hello @ajkarloss and @georgemarselis-nvi

The repository with the Assembly and Annotation pipeline is the main IRIDA repository (https://github.com/phac-nml/irida/tree/development/src/main/resources/ca/corefacility/bioinformatics/irida/model/workflow/analysis/type/workflows/AssemblyAnnotation/0.6). Sorry that it's hard to find where it is located. I moved the issue to the correct repository and I will respond to the issue in another comment.

@apetkau
Copy link
Member

apetkau commented May 23, 2023

With regards to the --tmp-dir parameter and why it is not being passed to spades correctly, the shovill tool itself has a --tmpdir parameter that can be used and will pass that value to spades. If this is not set in the Shovill tool, it defaults to using File::Temp->newdir() to create a temporary directory and uses that path instead:

https://github.com/tseemann/shovill/blob/v1.1.0/bin/shovill#L119

$tmpdir ||= File::Temp->newdir(CLEANUP=>1);

Since the --tmpdir parameter is not exposed as a parameter in the Shovill Galaxy tool wrapper (https://github.com/galaxyproject/tools-iuc/blob/main/tools/shovill/shovill.xml), it is not available for you to adjust. So, Shovill is assuming that --tmpdir is unset and creates a temporary directory using the code above, which defaults to a directory under /tmp. It then passes this directory to the Spades tool, which appears in the command you see above.

So, the solution to override this would be to expose the --tmpdir parameter of Shovill as a configurable option in the Galaxy tool wrapper above.

However, another solution is to make adjustments to environment variables in Galaxy prior to running the Shovill tool. The File::Temp->newdir() function in Perl will use the TMPDIR environment variable to specify the location of the temporary directory. You can verify this using the below command:

$ TMPDIR=/opt/new-temp/ perl -MFile::Temp -e '$d=File::Temp->newdir(CLEANUP=>1);print($d,"\n");'
/opt/new-temp/pWZSfF_u9T

So, if you can set the TMPDIR environment prior to running the Shovill software (such as in the environment_setup_file configured by Galaxy: https://docs.galaxyproject.org/en/master/admin/config.html#environment-setup-file), you could change the temporary directory without modifying the Shovill tool wrapper.

@georgemarselis-nvi
Copy link

Thanks for the information!

@georgemarselis-nvi
Copy link

hey just incase you got a better idea than us. We are facing the same issue with scheduling: we have set slurm to the maximum cores available for the machine and I have made sure to run manual tests to check if the galaxy account can schedule 64 jobs (it can and more)

yet

image

every time a user runs an analysis, each step takes up only one slot/cpu and galaxy only schedules up to 12 jobs.

Do you have any clues please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants