Skip to content
This repository has been archived by the owner on Jul 19, 2021. It is now read-only.

Can this software create an abundance table? #20

Open
slvrshot opened this issue Jan 27, 2019 · 0 comments
Open

Can this software create an abundance table? #20

slvrshot opened this issue Jan 27, 2019 · 0 comments

Comments

@slvrshot
Copy link

slvrshot commented Jan 27, 2019

Hi Brian I have used your software for many years but was wondering if it had this capability.

I have a directory of files (48 files, 4 groups each containing 12 samples) containing the output from diamond. I used diamond to perform a blastx search of reads against a protein database. All I want to be able to do is generate counts for each occurrence of each gene identified per sample and create an abundance table containing these counts for all samples.

3TFCDRXX:2:1101:19180:1172	BAC0211|mdtB/yegN|sp|P76398|MDTB_ECOLI	52.0	50	24	0	1	150	741	790	1.2e-08	50.1
A00484:57:H3TFCDRXX:2:1101:9598:1204	BAC0316|pstB|sp|P0AAH0|PSTB_ECOLI	72.9	48	13	0	150	7	155	202	9.2e-17	77.0
A00484:57:H3TFCDRXX:2:1101:29939:1611	BAC0619|copA|tr|Q7WYH1|Q7WYH1_PSEPU	65.2	46	16	0	6	143	132	177	3.3e-16	75.1
A00484:57:H3TFCDRXX:2:1101:29595:1642	BAC0211|mdtB/yegN|sp|P76398|MDTB_ECOLI	85.2	27	4	0	3	83	889	915	4.5e-08	48.1
A00484:57:H3TFCDRXX:2:1101:16242:1689	BAC0467|zraR/hydH|sp|P14375|ZRAR_ECOLI	58.3	48	20	0	146	3	284	331	1.7e-10	56.2
A00484:57:H3TFCDRXX:2:1101:9516:1752	BAC0646|mdtB|tr|D0ZND9|D0ZND9_SALT1	64.5	31	11	0	58	150	42	72	5.1e-07	44.7
The 

Above is what the file looks...It basically has the sequence id extracted from a .fasta file and its corresponding blastx hit.

A buddy created a simple script to do this:

cut –f 2 diamond_output.tab > diamond_output_ids
Then:
sort diamond_output_ids | uniq –c | sort –n > ids_counts

The output of ids_counts looks like this (the numbers represent the number of times the gene was observed):

86 BAC0269|nia|tr|Q92Z60|Q92Z60_RHIME
  87 BAC0504|farB|tr|Q9RQ29|Q9RQ29_NEIGO
  88 BAC0078|copA|sp|O32220|COPA_BACSU
  89 BAC0487|pmrA|sp|Q70FH0|PMRA_PECSS

But I feel like BBMap could do this far more efficiently than doing each sample serially. Can BBMap help me?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant