Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transeq bug #156

Open
anilthanki opened this issue Aug 31, 2022 · 2 comments
Open

Transeq bug #156

anilthanki opened this issue Aug 31, 2022 · 2 comments
Labels

Comments

@anilthanki
Copy link
Member

anilthanki commented Aug 31, 2022

EMBOSS TranSeq has a bug/feature which removes anything before a colon : from sequence ids (i.e. the first "word" after > in each FASTA header line).

This results in empty cluster_cds files and failed workflow.

So if input sequences contains : in sequence identifiers, they need to be removed or another copy of CDS dataset needs to be prepared to avoid this error.

Thanks

Anil

@anilthanki anilthanki added the bug label Aug 31, 2022
@nsoranzo
Copy link
Member

Example:
Input:

> Before colon: after colon
accaGTTACCCTCATCATCTTAGCTGATAGCCAGCCAGCCACCACAGGCAtgagtca
> Before_colon:after_colon
gtttgccatcttttgctgctctagggaatccagcagctgtcaccatgtaaacaagcccagg

Output of transeq -sequence 'test.fasta' -outseq 'protein.fasta' -frame 1 -table 0 -trim no -osformat2 fasta -auto:

>Before_1 colon: after colon
...
>after_colon_1
...

@anilthanki
Copy link
Member Author

Extended test

>test:A
ACAGTCATCGAATCCGACTAC
>test : B
ACGACTAGCATCAGCACTA
>test :C
ACGACTAGCATCAGCACTA
>test: D
ACGACTAGCATCAGCACTA

Output

>A_1
TVIESDY
>test_1 : B
TTSISTX
>test_1 :C
TTSISTX
>test_1 D
TTSISTX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants