Kallisto total UMI difference between velocity and novelocity modes. (single cell) #419

mortunco · 2023-12-11T20:19:21Z

Hello,

Thank you for keeping kallisto alive. These days, more and more I realise how its actually important and hard maintain and update tools across many years.

Background:

I am processing same 10x 5' v2 chemistry sample fastqs with kallisto. I found great number of total UMI difference between velocity and nonvelocity runs. I shared the commands i used to run the scripts

Velocity mode

conda activate ~/mambaforge/envs/kallisto
KALLISTO_INDEX_DIR="foo/bar/kallisto-velocity-index" ### downloaded with kb ref -d human --workflow nac
kb count --h5ad -i $KALLISTO_INDEX_DIR/index.idx -g $KALLISTO_INDEX_DIR/t2g.txt -x 10xv2 --workflow nac --h5ad -o kallisto \
        -c1 $KALLISTO_INDEX_DIR/spliced_t2c.txt -c2 $KALLISTO_INDEX_DIR/unspliced_t2c.txt --filter bustools \
            SRR13806026_S1_R1_001.fastq \
                SRR13806026_S1_R2_001.fastq

velocity ref

$ ls kallisto-velocity-index
index.idx  spliced_t2c.txt  t2g.txt  unspliced_t2c.txt

Nonvelocity mode

KALLISTO_INDEX_DIR="foo/bar/kallisto-prebuilt-index" ### downloaded with kb ref -d human --workflow standard
kb count --h5ad -i $KALLISTO_INDEX_DIR/index.idx -g $KALLISTO_INDEX_DIR/t2g.txt -x 10xv2 --workflow standard --filter bustools -o kallisto_novelo_prebuilt SRR13806026_S1_R1_001.fastq SRR13806026_S1_R2_001.fastq

novelocity ref

$ ls kallisto-prebuilt-index
human_index_standard.tar.xz  index.idx  t2g.txt

Attempts

I calculated total UMI counts using the following lines. For this question please ignore the other kallisto runs. Although I used them trying to figure out whats going on, its not related to this question. Here you can see the total UMI difference between velocity and nonvelocity_prebuilt runs. One important note here is that I added ambigous + spliced and unspliced to velocity.

kallisto_novelocity_prebuilt=ad.read_h5ad("foo/bar/sample/kallisto_novelo_prebuilt/counts_filtered/adata.h5ad")
kallisto_novelocity_prebuilt.var_names_make_unique()

kallisto_velocity=ad.read_h5ad("foo/bar/sample/kallisto_velocity/counts_filtered/adata.h5ad")
kallisto_velocity.var_names_make_unique()

total_umi["kallisto_velocity"]= kallisto_velocity.layers["spliced"].sum().sum() + kallisto_velocity.layers["ambiguous"].sum().sum() + kallisto_velocity.layers["unspliced"].sum().sum()
total_umi["kallisto_novelocity_prebuilt"]= kallisto_novelocity_prebuilt.X.sum().sum()

In other comparison, you may see that even we add spliced+ambigous it still does not add up to velocity.

print(f' kalisto velocty spliced: {kallisto_velocity[list(common_barcodes),:].layers["spliced"].sum().sum()}')
print(f' kalisto velocty ambigous + spliced: {kallisto_velocity[list(common_barcodes),:].layers["spliced"].sum().sum() + kallisto_velocity[list(common_barcodes),:].layers["ambiguous"].sum().sum()}')
print(f' kalisto noveloctiy: {kallisto_novelocity_prebuilt[list(common_barcodes),:].X.sum().sum()}')

kalisto velocty spliced: 216012.0
kalisto velocty ambigous + spliced: 781301.0
kalisto noveloctiy: 905870.0

Questions

It is clearly the impact of index + workflow. But in terms of good practice could you please show me guidance. I would like to run kallisto only once and use spliced+unspliced counts for standard sc analysis calculation and their ratio for the velocity/trajectory analysis.

Thank you very much for your time and patience,

Best regards,

Tunc.

The text was updated successfully, but these errors were encountered:

Yenaled · 2023-12-12T03:25:51Z

Hello! I just looked at the dataset -- for some reason, there seems to be strand-bias [causing mapping rate < 30%] under the default 10x settings (forward-stranded) so set --strand=unstranded.

As for your actual question: Why do mature+ambiguous gives fewer UMIs than just standardly mapping to the transcriptome (while discarding out-of-transcriptome reads)? This is because of UMI deduplication. In standard index, the same UMI sequences may simply map to a single gene, but in nac index, some of the same UMI sequences may map to multiple places (because there are more things in the index) -- and hence get filtered out. tl;dr it's because of UMI counting.

And yes, you want to sum ambiguous+mature.

mortunco · 2023-12-12T17:29:20Z

Dear Yanaled, I am gonna give --strand=unstranded a try. Thank you very much for the hasty reply. I really appreciate it.

NikTuzov · 2023-12-12T22:42:09Z

Hello Tunc:

I am using a similar velocity workflow and I was wondering if you could please answer my question here:
pachterlab/kb_python#228 (comment)

Yenaled · 2024-01-04T11:44:40Z

@mortunco I just released a new version of kb-python (version 0.28.1) -- it should cause less UMI differences between the standard index and the nac index.

Previously, if there was an exon/intron overlap of two different genes, the intron was being prioritized (which caused issues since a UMI could map to many introns). Now, the exon is being prioritized and, from what I tested, there is very high agreement between the standard index and the nac index.

Yenaled mentioned this issue Dec 12, 2023

Issues with running RNA velocity (La Manno) analysis in kb_python 0.28.0 pachterlab/kb_python#228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kallisto total UMI difference between velocity and novelocity modes. (single cell) #419

Kallisto total UMI difference between velocity and novelocity modes. (single cell) #419

mortunco commented Dec 11, 2023

Yenaled commented Dec 12, 2023

mortunco commented Dec 12, 2023

NikTuzov commented Dec 12, 2023

Yenaled commented Jan 4, 2024

Kallisto total UMI difference between velocity and novelocity modes. (single cell) #419

Kallisto total UMI difference between velocity and novelocity modes. (single cell) #419

Comments

mortunco commented Dec 11, 2023

Background:

Velocity mode

velocity ref

Nonvelocity mode

novelocity ref

Attempts

Questions

Yenaled commented Dec 12, 2023

mortunco commented Dec 12, 2023

NikTuzov commented Dec 12, 2023

Yenaled commented Jan 4, 2024