Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if -index 1 is passed with -p flag only one split file is flagged #55

Open
jsmedmar opened this issue Aug 11, 2020 · 3 comments
Open

if -index 1 is passed with -p flag only one split file is flagged #55

jsmedmar opened this issue Aug 11, 2020 · 3 comments
Labels

Comments

@jsmedmar
Copy link

if -index 1 is passed with -p flag, only one split file is flagged:

$ caveman.pl --version
VERSION: 1.16.0

$ ls -lah /tmpCaveman/
Aug 11 20:12 .
Aug 11 20:12 ..
Aug 11 20:11 alg_bean
Aug 11 20:11 caveman.cfg.ini
Aug 11 20:11 cov_arr
Aug 11 20:12 logs
Aug 11 20:11 prob_arr
Aug 11 20:12 progress
Aug 11 20:11 readpos.2
Aug 11 20:11 results
Aug 11 20:11 splitList
Aug 11 20:11 splitList.2
Aug 11 20:12 tumor_vs_normal.flagged.muts.vcf
Aug 11 20:11 tumor_vs_normal.flagged.muts.vcf.1
Aug 11 20:12 tumor_vs_normal.flagged.muts.vcf.gz
Aug 11 20:12 tumor_vs_normal.flagged.muts.vcf.gz.tbi
Aug 11 20:11 tumor_vs_normal.muts.ids.vcf
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.1
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.2
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.3
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.4
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.5
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.6
Aug 11 20:11 tumor_vs_normal.muts.vcf
Aug 11 20:11 tumor_vs_normal.no_analysis.bed
Aug 11 20:11 tumor_vs_normal.snps.ids.vcf
Aug 11 20:11 tumor_vs_normal.snps.vcf

I think -p flag should flag all regardless of -index since you can't pass multiple -i anyways

@keiranmraine
Copy link
Contributor

keiranmraine commented Aug 12, 2020

That is expected behaviour, it works the same as mstep/estep. The intent is that you would submit one job per index so that they can be performed in parallel.

If you want to spread the load over a known number of jobs you can execute them with the -limit option, same as mstep/estep. Executing the following 4 commands in parallel will process all flagging indexes regardless of number of split files. Should one fail for run time/memory it will resume from the last incomplete split element.

caveman.pl -p flag -l 4 -i 1
caveman.pl -p flag -l 4 -i 2
caveman.pl -p flag -l 4 -i 3
caveman.pl -p flag -l 4 -i 4

If you want to run them all in a single thread you don't declare -i, but you still retain the resume functionallity.

@jsmedmar
Copy link
Author

jsmedmar commented Aug 12, 2020

If I do this I get the following error: ERROR: based on reference and exclude option index must be between 1 and 1:

caveman.pl \
    -process flag \
    -threads 1 \
    -index 1 \
    -limit 2
    ...

caveman.pl \
    -process flag \
    -threads 1 \
    -index 2 \
    -limit 2
    ...

In this case only the first command works.

I think these are relevant lines:

my %index_max = ( 'setup' => 1,
'split' => -1,
'split_concat' => 1,
'mstep' => -1,
'merge' => 1,
'estep' => -1,
'merge_results' => 1,
'add_ids' => 1,
'flag' => 1);

if(exists $opts{'process'}) {
PCAP::Cli::valid_process('process', $opts{'process'}, \@VALID_PROCESS);
if(exists $opts{'index'}) {
my $max = $index_max{$opts{'process'}};
if($max==-1){
if(exists $opts{'limit'}) {
$max = $opts{'limit'};
}
else {
$max = Sanger::CGP::Caveman::Implement::valid_index(\%opts);
}
}
die "ERROR: based on reference and exclude option index must be between 1 and $max\n" if($opts{'index'} < 1 || $opts{'index'} > $max);
PCAP::Cli::opt_requires_opts('index', \%opts, ['process']);

It looks like the max amount of indices for flag is set to 1, so it does not allow to pass more than 1 index. I know if you don't specify -limit and -index, it does process it in parallel using multiple threads. But I think submitting separate jobs with -index is not working at the moment.

@keiranmraine
Copy link
Contributor

Thanks for clarifying the issue. That is a bug. We don't use it like this internally so it's been missed.

@keiranmraine keiranmraine reopened this Aug 12, 2020
@keiranmraine keiranmraine transferred this issue from cancerit/cgpCaVEManWrapper Aug 12, 2020
@keiranmraine keiranmraine transferred this issue from cancerit/cgpCaVEManPostProcessing Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants