Transeq bug #156

anilthanki · 2022-08-31T14:35:18Z

EMBOSS TranSeq has a bug/feature which removes anything before a colon : from sequence ids (i.e. the first "word" after > in each FASTA header line).

This results in empty cluster_cds files and failed workflow.

So if input sequences contains : in sequence identifiers, they need to be removed or another copy of CDS dataset needs to be prepared to avoid this error.

Thanks

Anil

The text was updated successfully, but these errors were encountered:

nsoranzo · 2022-08-31T15:56:21Z

Example:
Input:

> Before colon: after colon
accaGTTACCCTCATCATCTTAGCTGATAGCCAGCCAGCCACCACAGGCAtgagtca
> Before_colon:after_colon
gtttgccatcttttgctgctctagggaatccagcagctgtcaccatgtaaacaagcccagg

Output of transeq -sequence 'test.fasta' -outseq 'protein.fasta' -frame 1 -table 0 -trim no -osformat2 fasta -auto:

>Before_1 colon: after colon
...
>after_colon_1
...

anilthanki · 2022-09-01T11:31:52Z

Extended test

>test:A
ACAGTCATCGAATCCGACTAC
>test : B
ACGACTAGCATCAGCACTA
>test :C
ACGACTAGCATCAGCACTA
>test: D
ACGACTAGCATCAGCACTA

Output

>A_1
TVIESDY
>test_1 : B
TTSISTX
>test_1 :C
TTSISTX
>test_1 D
TTSISTX

anilthanki added the bug label Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transeq bug #156

Transeq bug #156

anilthanki commented Aug 31, 2022 •

edited by nsoranzo

Loading

nsoranzo commented Aug 31, 2022

anilthanki commented Sep 1, 2022

Transeq bug #156

Transeq bug #156

Comments

anilthanki commented Aug 31, 2022 • edited by nsoranzo Loading

nsoranzo commented Aug 31, 2022

anilthanki commented Sep 1, 2022

anilthanki commented Aug 31, 2022 •

edited by nsoranzo

Loading