Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA-seq contamination check #163

Open
Jakob37 opened this issue Oct 1, 2024 · 3 comments
Open

RNA-seq contamination check #163

Jakob37 opened this issue Oct 1, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Jakob37
Copy link
Contributor

Jakob37 commented Oct 1, 2024

Description of feature

Hi from Lund!

We discussed Tomte today, and what would be needed for us to get it into production.

One thing that came up is that we would want a contamination check. This is done in a separate RNA-seq pipeline here in Lund by selecting a set of ~200 sites in "housekeeping genes", calling these and checking for patterns in heterozygosity. I.e. if these patterns align with what would be expected from a pure or contaminated sample.

We would include these results in a QC report to give an indication of risk for contamination.

What do you guys say about having something similar added to Tomte?

@Jakob37 Jakob37 added the enhancement New feature or request label Oct 1, 2024
@jemten
Copy link
Contributor

jemten commented Oct 1, 2024

Sounds like a cool idea to me. There is a variant calling part of tomte. Do you want to take the vcf generated there, extract the calls in housekeeping genes and run your heterozygosity check or would you like to do a separate VC for this?

@Jakob37
Copy link
Contributor Author

Jakob37 commented Oct 1, 2024

Sounds like a cool idea to me. There is a variant calling part of tomte. Do you want to take the vcf generated there, extract the calls in housekeeping genes and run your heterozygosity check or would you like to do a separate VC for this?

Yes, I was wondering the same actually. I think I understood it as that they currently do DNAScope for targeted sites, in addition to the regular calls. But I don't know if there is a reason for not just reusing a subset of the already done calls, and feed these into the contamination check script. I'll ask around.

@Jakob37
Copy link
Contributor Author

Jakob37 commented Oct 2, 2024

Sounds like we are doing a specific calling step for the set of variants to be able to get 0/0 calls in the output as well. The rationale here is to be sure that they were successfully called, and not just absent due to lacking coverage.

I think we are using DNAScope for this at the moment. But would guess GATK's haplotype caller might work as well.

After that we do a post-processing step using these calls to estimate the contamination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants