-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving speed when running kb count
#55
Comments
It should automatically parallelize (rather than sequential reading) if you enable many threads -- that's one reason that splitting FASTQ files into multiple chunks enables faster processing. kallisto should be pretty fast unless you're doing single nucleus rnaseq or rna velocity -- with enough threads, it will only take 1-3 seconds to process a million reads. Also, make sure you're using the current version of kb-python (version 0.27.3) since speed improvements have been made. Finally, post issues on the kallisto or the kb-python github page -- I'm usually more responsive on those pages. |
Hi, Thank you so much for your quick response! This is the command I'm running for RNA Velocity analysis. Currently it's taking 30-40 mins and each of the fastq's are 1000 reads, with the index file being ~40GB. Additionally, each of the files here are 119MB. Is this expected?
Additionally, just to clarify once again, if I specify the following command, it should already be parallelizing? Or do I need to do anything additional to split the FASTQ files into multiple chunks? And would the output folder ( Thanks so much for your help! |
OK, yes, rna velocity is just slow with kallisto. This will change in our forthcoming release of kb-python (version 0.28; currently on devel branch), which will be released in the next week or so. I don't think there's much you can do in terms of speed with the current version of kb-python. And yes, it will be parallelizing automatically with the command you supplied (and the output will be no different than combining the subsamples into a single fastq file). |
When you input multiple FASTQ files into the
kb count
function, does it process them sequentially or is there a way to parallelize it? Especially because for me, the first step "kallisto bus" takes the longest (when loading the index and mapping). Is there a way to parallelize this process or any other tips to improve speed?Thank you!
The text was updated successfully, but these errors were encountered: