Skip to content

Commit

Permalink
Fixed a bug in file order for plotsr
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed May 28, 2024
1 parent 18a8fb2 commit 543625a
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 10 deletions.
3 changes: 3 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,9 @@ process {
}

withName: '.*:FASTA_SYNTENY:PLOTSR' {

ext.args = '-d 600'

publishDir = [
path: { "${params.outdir}/synteny/plotsr" },
mode: params.publish_dir_mode,
Expand Down
Binary file modified docs/images/plotsr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 4 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ You will need to create an assemblysheet with information about the assemblies y
- `tag:` A unique tag which represents the target assembly throughout the pipeline and in the final report. The `tag` and `fasta` file name should not be same, such as `tag.fasta`. This can create file name collisions in the pipeline or result in file overwrite. It is also a good-practice to make all the input files read-only.
- `fasta:` FASTA file
- `gff3 [Optional]:` GFF3 annotation file if available
- `monoploid_ids [Optional]:` A txt file listing the sequence IDs used to calculate LAI in monoploid mode if necessary. If the intent is to run LAI against all the sequences in an assembly, this file can be skipped for that assembly.
- `monoploid_ids [Optional]:` A txt file listing the sequence IDs used to calculate LAI in monoploid mode if necessary. If the intent is to run LAI against all the sequences in an assembly, this file can be skipped for that assembly. Soft masked regions are ignored when calculating LAI. The pipeline may fail if all the LTRs are already soft masked.
- `synteny_labels [Optional]:` A two column tsv file listing fasta sequence IDs (first column) and their labels for the synteny plots (second column) when performing synteny analysis. If a sequence ID is missing from this file, the corresponding sequence is excluded from the analysis. If `synteny_labels` is not provided for an assembly, that assembly is excluded from the analysis.

See the [Merqury](#merqury-k-mer-analysis) section For description of assemblysheet columns related to k-mer analysis with Merqury.
Expand Down Expand Up @@ -72,6 +72,9 @@ This section provides additional information for parameters. It does not list al

- `synteny_plotsr_assembly_order`: The order in which Minimap2 alignments are performed and, then, plotted by Plotsr. For assembly A, B and C; if the order is specified as 'B C A', then, two alignments are performed. First, C is aligned against B as reference. Second, A is aligned against C as reference. The order of these assemblies on the Plotsr figure is also 'B C A' so that B appears on top, C in the middle and A at the bottom. If this parameter is `null`, the assemblies are ordered alphabetically. All assemblies from `input` and `synteny_xref_assemblies` are included by default. If an assembly is missing from this list, that assembly is excluded from the analysis.

> [!NOTE]
> PLOTSR performs a sequence-wise (preferably chromosome-wise) synteny analysis. The order of the sequences for each assembly is inferred from its `synteny_labels` file.
### Merqury K-mer analysis

Additional assemblysheet columns:
Expand Down
30 changes: 21 additions & 9 deletions subworkflows/local/fasta_synteny.nf
Original file line number Diff line number Diff line change
Expand Up @@ -272,11 +272,14 @@ workflow FASTA_SYNTENY {
| map { [ it ] }
| collect
| map { list ->
if ( plotsr_assembly_order == null) {
if ( plotsr_assembly_order == null ) {
return list.sort(false) { it[0].id.toUpperCase() }
}

def order = plotsr_assembly_order.tokenize(' ')

if ( order.size() != order.unique().size() ) error "Tags listed by synteny_plotsr_assembly_order should all be unique: $order"

def tags = list.collect { it[0].id }

def ordered_list = []
Expand Down Expand Up @@ -333,23 +336,32 @@ workflow FASTA_SYNTENY {
[
[ id: 'plotsr' ],
syri,
[ rfa, tfa ] // Order matters, see https://github.com/schneebergerlab/plotsr/issues/70
[ tfa, rfa]
]
}
| groupTuple
| map { meta, syri, fastas ->
def fasta_list = fastas.flatten()
def syri_tags = syri.collect { it.name.replace('syri.out', '').tokenize('.on.') }.flatten().unique()
def proposed_order = plotsr_assembly_order ? plotsr_assembly_order.tokenize(' ') : syri_tags.sort(false)

def unique_fa = []
def available_tags = []
proposed_order.each { tag -> if ( tag in syri_tags ) available_tags << tag }

fastas.flatten().each { fasta ->
if ( ! ( fasta in unique_fa ) ) { unique_fa << fasta }
}
def ordered_fa = []
available_tags.each { tag -> ordered_fa << ( fasta_list.find { it.baseName == "${tag}.plotsr" } ) }

def ordered_syri_tags = []
available_tags.eachWithIndex { tag, index -> if ( index > 0 ) { ordered_syri_tags << "${tag}.on.${available_tags[index-1]}" } }

def ordered_syri = []
ordered_syri_tags.each { tag -> ordered_syri << ( syri.find { it.baseName == "${tag}syri" } ) }

[
meta,
syri,
unique_fa,
"#file\tname\n" + unique_fa.collect { it.baseName.replace('.plotsr', '') }.join('\n')
ordered_syri,
ordered_fa,
"#file\tname\n" + ordered_fa.collect { it.baseName.replace('.plotsr', '') }.join('\n')
]
}

Expand Down

0 comments on commit 543625a

Please sign in to comment.