library integration #16

jdmontenegro · 2023-12-19T13:11:05Z

Hi, I have recently started to play with single cell data analysis and your biorxiv paper and approach sounds really interesting. I understand how PCA dimensional reduction is probably the wrong assumption about the topology of the underlying data.
In that sense, I was curious how do you handle integration of multiple datasets. Traditionally, this integration is based on common variable PCs (eigenvectors), but those are selected assuming the same underlying topology. In your package, I can see that if we have biological replicates, we could select the same topology for two libraries and then select the most variable eigenvectors for integration, but what happens if the biological replicates use different library prep which introduce some kind of batch effect? That batch effect would influence the selection of the best fitting topology for the data and could make it difficult if not impossible to integrate datasets that should have a shared underlying biology and composition.

My question is, how can we handle these scenarios with Topometry? How to best select eigenvectors for integration of multiple datasets and how do we prevent sampling methods from introducing batch artifacts into the model?
These may be naive questions, but I would like to understand your take on these.
Best regards,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

library integration #16

library integration #16

jdmontenegro commented Dec 19, 2023

library integration #16

library integration #16

Comments

jdmontenegro commented Dec 19, 2023