Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker permissions #25

Open
MrDotOne opened this issue Jul 28, 2022 · 18 comments
Open

Docker permissions #25

MrDotOne opened this issue Jul 28, 2022 · 18 comments

Comments

@MrDotOne
Copy link

Just pulled PAA down the other day and have running it, my run command is:

/data/PrepareAA/docker/run_paa_docker.py -o /data/output -s Colo -t 16 --bam /data/Data/Colo/cofinal.bam --run_AA --run_AC

however after 22+ hours i get to this point and if fails miserably:

/home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity
reading /home/data_repo/GRCh38/Genes_hg38.gff
read 22998 genes

Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in
f2gf = open("feature_to_graph.txt", 'w')
PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt'
Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in
with open(args.input) as input_file, open(args.classification_file) as classification_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv'
2022-07-27 22:49:31.494158

I am unsure where the feature_to_graph.txt should be found and the Colo_amplicon_classification_profiles.tsv doesnt seem to be getting generated.

Any assistance would be appreciated

@jluebeck
Copy link
Member

Hi,

I have updated PrepareAA to handle issues related to permissions of the output directory in 580f923 and also consolidate a file from AmpliconClassifier that may be trying to write to a location not in nessarily in that same spot. Can you please pull the latest version of the docker image and try again? You may already have done so, but also please double check that the location you are hoping to save data to exists and has write permissions for root.

Thanks,
Jens

@MrDotOne
Copy link
Author

I made a change to the run file so when you execute it, it looks like this

docker run -u id -u $USER:id -g $USER --rm -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v :/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh

So everything should be read and written as the enduser running the app.

I will pull down the update(s) and give it a shot. Thank you.

@MrDotOne
Copy link
Author

MrDotOne commented Jul 28, 2022

Pulled and running, it will take over 20hours but i will let you know. Thank you for your time.

I do find adding the following to the run script avoids a lot of issues, so the data is written as the caregiver and not root:

-u id -u $USER:id -g $USER

@jluebeck
Copy link
Member

jluebeck commented Jul 28, 2022 via email

@MrDotOne
Copy link
Author

Someone on another repo suggested it, when i was having issues with the results being written as root and the person running it didnt have escalation privileges. I thought i would pass on that nugget.

@MrDotOne
Copy link
Author

I am still having issues

[root:INFO] #TIME 79252.045 Plotting SV View for amplicon7
[root:INFO] #TIME 79318.830 Total Runtime
/home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied
grep: write error: Broken pipe
/home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied
find: 'standard output': Broken pipe
find: write error
/home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory
/home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory
/home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected
cat: scf.txt: No such file or directory
/home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied
paste: san.txt: No such file or directory
rm: cannot remove 'san.txt': No such file or directory
rm: cannot remove 'scf.txt': No such file or directory
rm: cannot remove 'sgf.txt': No such file or directory
AmpliconClassifier 0.4.9
/home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity
reading /home/data_repo/GRCh38/Genes_hg38.gff
read 22998 genes

Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in
f2gf = open("feature_to_graph.txt", 'w')
PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt'
Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in
with open(args.input) as input_file, open(args.classification_file) as classification_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv'
2022-07-28 23:07:27.730295
PrepareAA version 0.1203.1

Matched /home/bam_dir/cofinal.bam to reference genome GRCh38
Running PrepareAA on sample: Colo

Running CNVKit batch
python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam

Running CNVKit segment
python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns

Cleaning up temporary files
rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn
gzip /home/output/Colo_cnvkit_output/cofinal.cnr

Running amplified_intervals
python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS
python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo

Running AC
/home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo
python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity
python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv
Completed

2022-07-29 21:26:25.262009

@MrDotOne
Copy link
Author

I will run as root and that should fix it but ...

@MrDotOne
Copy link
Author

OK, i reran the run as root using the run script as provided in the repo. It seems to have completed successfully. This is good progress. However, the two times i have run it with the run -u id $UID:id $GID it fails. I need to figure out how to get the results written as the caregiver so i dont have to intervene.

@MrDotOne
Copy link
Author

MrDotOne commented Aug 1, 2022

Unfortunately that is not working. The run file works fine, for root, but not for a non-escalated account. I keep getting this error when i run as a user with the id stuff in the run command

[root:INFO] #TIME 79384.895 Plotting SV View for amplicon7
[root:INFO] #TIME 79452.068 Total Runtime
/home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied
grep: write error: Broken pipe
/home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied
find: 'standard output': Broken pipe
find: write error
/home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory
/home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory
/home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected
cat: scf.txt: No such file or directory
/home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied
paste: san.txt: No such file or directory
rm: cannot remove 'san.txt': No such file or directory
rm: cannot remove 'scf.txt': No such file or directory
rm: cannot remove 'sgf.txt': No such file or directory
AmpliconClassifier 0.4.9
/home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity
reading /home/data_repo/GRCh38/Genes_hg38.gff
read 22998 genes

Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in
f2gf = open("feature_to_graph.txt", 'w')
PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt'
Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in
with open(args.input) as input_file, open(args.classification_file) as classification_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv'
2022-07-31 01:44:32.102144
PrepareAA version 0.1203.1

Matched /home/bam_dir/cofinal.bam to reference genome GRCh38
Running PrepareAA on sample: Colo

Running CNVKit batch
python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam

Running CNVKit segment
python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns

Cleaning up temporary files
rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn
gzip /home/output/Colo_cnvkit_output/cofinal.cnr

Running amplified_intervals
python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS
python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo

Running AC
/home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo
python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity
python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv
Completed

2022-08-01 00:05:22.206630

@jluebeck
Copy link
Member

jluebeck commented Aug 1, 2022

Hi,

Thank you for sharing. I have also now done some testing on my end and it appears that assigning a custom user for the image is non-trivial and that the above proposed solution (adding -u id $UID:id $GID) does not quite work as expected. I recommend that users run with the current default settings, generating the files as root and then users can chmod or copy the relevant files later if they need non-root ownership. I do not plan to address this issue of non-root ownership in the PrepareAA generated files at this particular time, but perhaps in the future if there is a compelling reason.

Jens

@jluebeck jluebeck closed this as completed Aug 1, 2022
@MrDotOne
Copy link
Author

MrDotOne commented Aug 2, 2022

Non-root users cannon chown/chgrp files., that is a serious cybersecurity concern.

@MrDotOne
Copy link
Author

MrDotOne commented Aug 2, 2022

Is there a way to implement a python script within the run file to do something similar to this?

(base) [root@lri-uapps-2 data]# cat chown.py
import os
path = "/data/output"
for root, dirs, files in os.walk(path):
for momo in dirs:
os.chown(os.path.join(root, momo), 1035688, 1001025)
for momo in files:
os.chown(os.path.join(root, momo), 1035688, 1001025)

Michael

@jluebeck
Copy link
Member

jluebeck commented Aug 2, 2022

Hi Michael,

Without re-assigning user IDs inside the container itself or alternatively sharing the /etc/passwd file from the host machine with the docker image, there is no way to provide the docker image with exact same user ids account/group information of the host machine. The previously proposed solution runs the image as a specific user inside the image, but that user is not mapped to the same user on the host machine. Perhaps one option is instead to have the docker script recursively chmod to add global read/write permissions on all the files written by the image into the mounted directory when it is finished. Would this solution be satisfactory for you? I can test this out in the next couple of days.

Jens

@MrDotOne
Copy link
Author

MrDotOne commented Aug 2, 2022

That is a solution i am trying to implement. I tried to use /home/output however the result was no such file or directory.

@MrDotOne
Copy link
Author

MrDotOne commented Aug 5, 2022

I just pulled [fc3b5e8] and will give a try with the --run_as_user option which looks promising already:

docker run --rm -e HOST_UID=$(id -u) -e HOST_GID=$(id -g) -u $(id -u):$(id -g) -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v /home/bendahm:/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh

I will let you know what i find. Thank you for looking into this

@MrDotOne
Copy link
Author

MrDotOne commented Aug 5, 2022

This is perfect:

(base) [root@lri-uapps-2 data]# cd output
(base) [root@lri-uapps-2 output]# ls -la
total 20
drwxrwxrwx 3 bendahm ccdomainusers 113 Aug 5 15:03 .
drwxrwxrwx 19 root root 4096 Aug 5 15:02 ..
drwxr-xr-x 2 bendahm ccdomainusers 126 Aug 5 15:10 Colo_cnvkit_output
-rw-r--r-- 1 bendahm ccdomainusers 0 Aug 5 15:03 Colo_timing_log.txt
-rw-r--r-- 1 bendahm ccdomainusers 1931 Aug 5 15:03 docker_home_manifest.log
-rw-r--r-- 1 bendahm ccdomainusers 11525 Aug 5 15:10 PAA_stdout.log

@jluebeck jluebeck changed the title Error in Docker permissions Aug 5, 2022
@jluebeck
Copy link
Member

jluebeck commented Aug 5, 2022

Glad to hear it is working for you. Reopening issue for others who may run in to issues despite this fix. I will note that this solution works as long as the docker daemon is configured to not offset UIDs and GIDs, which is sometimes done to improve security of the host machine. More info about the docker namespace remapping is available here: https://docs.oracle.com/cd/E37670_01/E75728/html/ol-docker-userns-remap.html.

Jens

@jluebeck jluebeck reopened this Aug 5, 2022
@MrDotOne
Copy link
Author

MrDotOne commented Aug 5, 2022

Thank you for the fixes and the link, i will check it out. There are a couple other repos like this that could use this technique. Unfortunately, we may be in research here, but this is not academia, and we lock stuff down pretty tightly. Sometimes to the point where things are unusable. This was of great benefit. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants