Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genopred Dockerfile #241

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

genopred Dockerfile #241

wants to merge 1 commit into from

Conversation

ofrei
Copy link
Contributor

@ofrei ofrei commented Mar 14, 2024

This is a Dockerfile for https://github.com/opain/GenoPred . I've managed to build it and have ~4 GB large genopred.sif file as output, however it does seem to generate some errors if I do this on our NREC devbox.

@espenhgn I think we need to either install apptainer and update docker to latest version on our devbox; or possibly create a new devbox just to not mess around with our current one (but for that we need to ask for more disk space - ideally a 2 TB large disk on NREC).

I don't suggest to merge this PR - eventually genopred.sif should be a separate repo as it's quite large.

>make genopred.sif
...
Step 16/17 : RUN pip cache purge
 ---> Using cache
 ---> 7a2e46263720
Step 17/17 : WORKDIR /tools
 ---> Using cache
 ---> a4cc19ee45e9
Successfully built a4cc19ee45e9
Successfully tagged genopred:latest
Using default tag: latest
The push refers to repository [localhost:5000/genopred]
Get http://localhost:5000/v2/: EOF
make: *** [Makefile:4: genopred.sif] Error 1

WORKDIR /tools/GenoPred
RUN git clone --depth 1 --branch v2.2.0 https://github.com/opain/GenoPred.git .

RUN conda env update -f /tools/GenoPred/pipeline/envs/pipeline.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [hadolint] <DL3059> reported by reviewdog 🐶
Multiple consecutive RUN instructions. Consider consolidation.

RUN /bin/bash -c ". activate genopred && cd /tools/GenoPred/pipeline && snakemake --restart-times 3 -j 1 --use-conda --conda-frontend mamba get_dependencies"

# cleanup for smaller image size
RUN mamba clean -a -y

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [hadolint] <DL3059> reported by reviewdog 🐶
Multiple consecutive RUN instructions. Consider consolidation.


# cleanup for smaller image size
RUN mamba clean -a -y
RUN pip cache purge

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [hadolint] <DL3059> reported by reviewdog 🐶
Multiple consecutive RUN instructions. Consider consolidation.

@espenhgn
Copy link
Contributor

Good. Maybe time to prune docker for old builds etc., again (sudo docker system prune). It wont interfere with anything I'm doing at the moment. We should be able to reclaim also 3.3GB in /tmp. /dev/sda1 is now pretty full.

@ofrei
Copy link
Contributor Author

ofrei commented Mar 18, 2024

I've got genopred up and running on NREC dev box, but not yet up and running when it's packaged in a container. Attached is an example of genopred's output (using test data as in the tutorial - https://opain.github.io/GenoPred/pipeline_readme.html#run-using-test-data. ; most interesting files in attached .tar.gz are example_plink1/reports/example_plink1-report.html report; also nice individual-level reports, e.g. example_plink1/reports/example_plink1-11_MID.11_MID-report.html ).

test_data_output.tar.gz

When run as a container I do the following:

export SIF=/nrec/space/ofrei/github/comorment/containers/singularity
cd /nrec/space/ofrei/github/opain/GenoPred/pipeline
singularity exec --home $PWD:/home $SIF/genopred.sif  bash
git config --global --add safe.directory /tools/GenoPred
conda activate /usr/local/envs/genopred
snakemake -n --configfile=example_input/config.yaml output_all
snakemake -j1 --configfile=example_input/config.yaml --use-conda output_all

The snakemake -n step works, but snakemake -j1 fails with the following error:
error.txt

I think to get past this we'll need to better understand how snakemake manages conda environments, and whether this is compatible with singularity's framework (e.g. readonly file system).

Also genopred seem to be tested in environments where there is internet, but we don't have that on TSD, so some featuers may not work (e.g. pulling score files from PGS catalog). It's good to have a complete of genopred features that require internet.

@espenhgn
Copy link
Contributor

As we discussed, "libmambapy.bindings.MambaNativeException: filesystem error: temp_directory_path: No such file or directory [/nrec/projects/tmp]" is because TMPDIR is set to /nrec/projects/tmp in this environment, which Singularity doesn't mount by default such as /tmp. So it needs to be appended to the list of mounted dirs.

Copy link

This pull request appears to be stale due to non-activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants