Replies: 5 comments
-
The key question is: why you think the integration is necessary? There're subquestions to that:
|
Beta Was this translation helpful? Give feedback.
-
Hi,
Thanks very much for getting back to me.
In response:
- Reason for integration: To decide which of the three clusters in condition 2 are healthy cells. I assumed the healthy cells in condition 2 would cluster with the healthy cells in the control condition 1 and the cancer cells in condition 2 would form a separate cluster(s). Is there another way apart from integration of telling which of the 3 clusters in condition 2 is the healthy one? I used a Venn diagram approach to check how many genes there were in common between each cluster in condition 2 versus the single cluster in condition 1. However, that seems a little unsophisticated perhaps??
- For each condition there is only 1 sample with a couple of thousand cells each approximately.
- All the samples were prepped and sequenced on the same day in the same 10X run.
- All samples were processed as part of the same, single batch as explained above. In a separate integration with another sample (3-sample integration) cells clustered by cell type.
Best,
John
Dr John Jacob
…Sent from my iPhone
On 17 Oct 2022, at 10:19, f6v ***@***.***> wrote:
The key question is: why you think the integration is necessary? There're subquestions to that:
How many biological samples are there in each condition?
Were the samples sequenced and prepared on the same day?
Do the cells from different batches cluster by batch or by cell type?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
It's a bit hard to say without experimenting with the data, so take everything with a grain of salt. I think there's a risk to erase biological variation when using data integration. And 1 vs 1 sample is tricky, I've seen datasets where two samples from the same group end up very different to each other. My point is that this like an underdetermined system.
This depends on cancer type, but some people use other software to infer CNVs or other mutations. So you could distinguish cancer and healthy cells based on that. http://www.bioconductor.org/packages/devel/bioc/vignettes/infercnv/inst/doc/inferCNV.html as an example, but I haven't used it myself. |
Beta Was this translation helpful? Give feedback.
-
Hi @f6v, |
Beta Was this translation helpful? Give feedback.
-
HI @jjacob12 |
Beta Was this translation helpful? Give feedback.
-
Hello,
I wonder if I can get your insights into a problem when clustering cells grown in vitro in an un-integrated vs integrated workflow.
If I run either the conventional coding pipeline (e.g. Seurat - Guided Clustering Tutorial) or the SCTransform variation I can see that a neuronal subtype of interest has just one cluster in a control/healthy condition (condition 1) and three clusters in another condition which contains cancer cells and healthy cells (condition 2). To explain the difference I've hypothesised that some of the cancer cells differentiate in condition 2 and start to resemble superficially the healthy cell subtype. Here are a couple of images - the top one is the control (condition 1) and the bottom is the test (condition 2). NEUROD1 is the gene used to mark the subtype of interest:
For both condition 1 and condition 2, the
res
was the same (res=0.5) in theFindClusters()
command.To see if these different cell identities could be resolved in an integrated workflow I subsetted the seurat object for the clusters of interest from analysis of each condition individually and attempted to integrate these clusters by running the following (with and without regressing out the cell cycle difference between G2M and S which were apparent in an exploratory analysis - made little difference to the final DimPlot result):
On the
neurod1.combined
object I ran the standard commands for clustering and visualisation:then got this result (red dots= condition 1; blue dots=condition 2):
Based on the clustering of the individual samples for NeuroD1 shown above, I was expecting to see all red dots and some blue dots intermingled forming a cluster (condition 1 and 2 both contain healthy cells) and a separate cluster of only blue dots (representing cancer cells from condition 2 that differentiated). Maybe that's a naive expectation!
I'm not sure if this workflow is appropriate (pulling out clusters from different conditions and trying to integrate them, with repeat normalisation, scaling, etc of the integrated object). I also ran this with CCA instead of RPCA, and as expected there was again no difference.
Incidentally, I also tried integrating the entirety of the cell populations from both condition 1 and condition 2, not just specific clusters of interest, and then sub-clustered the NeuroD1 expressing cluster in the integrated object, but this too did not reveal the partitioning of cells expressing the marker according to their condition of origin.
Hope I have explained this sufficiently well!
Thanks in advance.
John
Beta Was this translation helpful? Give feedback.
All reactions