Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test CENTRIFUGE with classification filterings, and compare centrifuge to Kraken2 #28

Open
LilyAnderssonLee opened this issue Aug 8, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@LilyAnderssonLee
Copy link

LilyAnderssonLee commented Aug 8, 2023

Centrifuge uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index.

Run Centrifuge and Kraken2 for the samples within the clinical case #374764. In the DNA sample, we have confirmed that 9 reads were assigned to HHV7, and these reads were identified as true positives

TO ANSWER:
1: Can we also detect HHV7 using Centrifuge, and how many reads were assigned to it?

2: Which classifier identifies more organisms?

3: Which classifier has more false positives in the assigned reads when validated through blasting?"

@LilyAnderssonLee LilyAnderssonLee self-assigned this Aug 8, 2023
@LilyAnderssonLee LilyAnderssonLee changed the title Compare Centrifuge and Kraken classification Test CENTRIFUGE with classification filterings and compare centrifuge to Krakne2 Aug 8, 2023
@LilyAnderssonLee
Copy link
Author

LilyAnderssonLee commented Aug 8, 2023

Conclusion:

1: Kraken2 and Centrifuge assigned the same 9 reads to HHV7 and one read of HHV4.

2: Centrifuge assigned 98 reads to the Viruses category, while Kraken2 assigned 33 reads to the same category.

3: Centrifuge predicted more Virus species (56) compared to Kraken2 (24). However, Centrifuge showed significantly higher false positives, as indicated by blast.

4: The false positives of Centrifuge can be reduced by considering hitLength and numMatches in the classification. For instance, Alcelaphine herpesvirus 1 was falsely reported by Centrifuge, but the hitLength was only 23 bp, which is far from the reliable standard. I would suggest setting the hitLength between 50 bp to 100 bp to reduce false positives. As for numMatches, we could skip it for now since I am not sure about species sharing the same genomic regions.

5: Centrifuge assigned 9 reads to Human endogenous retrovirus K113, which were not reported by Kraken2. These 9 reads were confirmed by BLAST.

@LilyAnderssonLee LilyAnderssonLee changed the title Test CENTRIFUGE with classification filterings and compare centrifuge to Krakne2 Test CENTRIFUGE with classification filterings, and compare centrifuge to Krakne2 Aug 8, 2023
@LilyAnderssonLee LilyAnderssonLee added the enhancement New feature or request label Aug 8, 2023
@LilyAnderssonLee LilyAnderssonLee changed the title Test CENTRIFUGE with classification filterings, and compare centrifuge to Krakne2 Test CENTRIFUGE with classification filterings, and compare centrifuge to Kraken2 Aug 21, 2023
@sofstam sofstam closed this as completed Sep 25, 2023
@sofstam
Copy link

sofstam commented Jun 10, 2024

During ENNGS workshop, it was mentioned that one lab is using 5000 as quality filter.

@sofstam sofstam reopened this Jun 10, 2024
@LilyAnderssonLee
Copy link
Author

Good to know this.
I think we need to test all parts in taxprofiler we are using by simulated data. I am thinking if we need to add this to the taxprofiler validation report.

@sofstam
Copy link

sofstam commented Jun 10, 2024

Since we have not tested this metric, I would suggest to wait to the next version of validation.

@LilyAnderssonLee
Copy link
Author

Yes, it makes sense. We need to do this validation based on simulated data to improve the performance of the taxprofiler and to meet the requirements of IVDR, perhaps sometime in late autumn or winter, after the major release of taxprofiler for long reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants