-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nClus parameter not working #14
Comments
Hello,
Easy fix, I went into a subdirectory Norm_nClus#, every time I run the CytoNorm.train step. Now there is still one thing that I don't fully understand why it's happening and it looks a bit suspicious. Even though I'm training and fitting with different numbers of clusters, I get exactly the same warnings of exactly the same proportions of cells that are far away from their cluster centers.
... And exactly the same with nClus=20:
I admit that this might be a coincidence with just the first cluster being the same but I was wondering if you have any ideas on how to explore further. |
Hi Emma,
I think this might be because the underlying number of clusters of the
FlowSOM tree is not adapted, so the mapping of the cells will be similar.
The metaclustering (decided by nClus) will only happen afterwards
(clustering the clusters), but this outlier detection is done on the
individual cluster level (default 100 clusters if you did not adapt these
parameters). If the seed is fixed, I would indeed expect this to be the
same every time.
All the best,
Sofie
…On Tue, 11 Aug 2020 at 08:12, Emma ***@***.***> wrote:
Hello,
I have found what the issue was so I thought I'd update here too. CytoNorm
is writing a tmp folder with the FlowSom clustering of the training from
prepareFlowSOM. Because I was running it in the same directory with
different parameters (nClus), even though I was running prepareFlowSOM
every time with the different nClus, when it came to the training with
CytoNorm.train, it was finding the tmp directory already there and it was
overwriting the fsom obj that I had run further above:
if (!file.exists(file.path(outputDir, "CytoNorm_FlowSOM.RDS"))) {
...
} else {
fsom <- readRDS(file.path(outputDir, "CytoNorm_FlowSOM.RDS"))
warning("Reusing previously saved FlowSOM result.")
}
Easy fix, I went into a subdirectory Norm_nClus#, every time I run the
CytoNorm.train step.
Now there is still one thing that I don't fully understand why it's
happening and it looks a bit suspicious. Even though I'm training and
fitting with different numbers of clusters, I get exactly the same warnings
of exactly the same proportions of cells that are far away from their
cluster centers.
For example with nClus=5 I get:
There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In FlowSOM::NewData(fsom$FlowSOM, ff) :
887 cells (2.65%) seem far from their cluster centers.
2: In FlowSOM::NewData(fsom$FlowSOM, ff) :
2382 cells (2.73%) seem far from their cluster centers.
3: In FlowSOM::NewData(fsom$FlowSOM, ff) :
1021 cells (6.28%) seem far from their cluster centers.
4: In FlowSOM::NewData(fsom$FlowSOM, ff) :
4241 cells (4.58%) seem far from their cluster centers.
5: In FlowSOM::NewData(fsom$FlowSOM, ff) :
3813 cells (9.64%) seem far from their cluster centers.
6: In FlowSOM::NewData(fsom$FlowSOM, ff) :
3816 cells (24.13%) seem far from their cluster centers.
7: In FlowSOM::NewData(fsom$FlowSOM, ff) :
671 cells (2.97%) seem far from their cluster centers.
8: In FlowSOM::NewData(fsom$FlowSOM, ff) :
2111 cells (7.73%) seem far from their cluster centers.
9: In FlowSOM::NewData(fsom$FlowSOM, ff) :
857 cells (2.19%) seem far from their cluster centers.
10: In FlowSOM::NewData(fsom$FlowSOM, ff) :
1370 cells (6.58%) seem far from their cluster centers.
... And exactly the same with nClus=20:
There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In FlowSOM::NewData(fsom$FlowSOM, ff) :
887 cells (2.65%) seem far from their cluster centers.
2: In FlowSOM::NewData(fsom$FlowSOM, ff) :
2382 cells (2.73%) seem far from their cluster centers.
3: In FlowSOM::NewData(fsom$FlowSOM, ff) :
1021 cells (6.28%) seem far from their cluster centers.
4: In FlowSOM::NewData(fsom$FlowSOM, ff) :
4241 cells (4.58%) seem far from their cluster centers.
5: In FlowSOM::NewData(fsom$FlowSOM, ff) :
3813 cells (9.64%) seem far from their cluster centers.
6: In FlowSOM::NewData(fsom$FlowSOM, ff) :
3816 cells (24.13%) seem far from their cluster centers.
7: In FlowSOM::NewData(fsom$FlowSOM, ff) :
671 cells (2.97%) seem far from their cluster centers.
8: In FlowSOM::NewData(fsom$FlowSOM, ff) :
2111 cells (7.73%) seem far from their cluster centers.
9: In FlowSOM::NewData(fsom$FlowSOM, ff) :
857 cells (2.19%) seem far from their cluster centers.
10: In FlowSOM::NewData(fsom$FlowSOM, ff) :
1370 cells (6.58%) seem far from their cluster centers.
I admit that this might be a coincidence with just the first cluster being
the same but I was wondering if you have any ideas on how to explore
further.
Thanks,
Emma
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#14 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOS722XZYODVCOMW3N3AADSADOMDANCNFSM4PUNESMA>
.
|
@emmanuelaaaaa that tmp folder thing is a subtle trap, so well done for noticing it! Always worth checking to see if it's still there, which might happen if CytoNorm gets interrupted. In terms of the later error you mention:
It would be the same each time because as @SofieVG said, the first level of clustering will generate the same number of clusters (~100) and then the metaclustering will group into 5 or 20 metaclusters etc. One reason it might happen is if your data is very variable between batches, so the clusters are capturing cells that are actually quite spread out. It's possible you could try increasing the number of first level clusters (by increasing the 'grid size' -- xdim = 10 and ydim = 10 results in 10 x 10 = 100 clusters) to capture this. If you're data has small batch effects then this is more likely to be because your are capturing cells from different populations into each first level cluster, and the solution would again to try again with an increased grid size. |
Hello again :),
I have been running into a weird issue where I specify the number of clusters as e.g. 20, and it's running flowsom with nClus=20, the plot for the CV looks ok, but when it's doing the training it's only using 10 clusters, so it says Processing cluster 1... up to 10. The same with the actual normalisation, it seems to only be using 10 clusters.
Any idea what's happening there?
Many thanks and best wishes,
Emma
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.10 (Final)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
locale:
[1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1
[5] LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 LC_PAPER=en_GB.ISO-8859-1 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] flowCore_1.52.1 FlowSOM_1.18.0 igraph_1.2.5 dplyr_1.0.0 CytoNorm_0.0.5 optparse_1.6.6
loaded via a namespace (and not attached):
[1] Biobase_2.46.0 splines_3.6.3 jsonlite_1.6.1 ConsensusClusterPlus_1.50.0 R.utils_2.9.2
[6] ellipse_0.4.2 gtools_3.8.2 RcppParallel_5.0.1 stats4_3.6.3 latticeExtra_0.6-29
[11] RBGL_1.62.1 flowWorkspace_3.34.1 yaml_2.2.1 robustbase_0.93-6 pillar_1.4.4
[16] lattice_0.20-41 glue_1.3.2 digest_0.6.25 RColorBrewer_1.1-2 colorspace_1.4-1
[21] ggcyto_1.14.1 Matrix_1.2-18 R.oo_1.23.0 plyr_1.8.6 pcaPP_1.9-73
[26] XML_3.99-0.3 pkgconfig_2.0.3 pheatmap_1.0.12 tsne_0.1-3 fda_5.1.4
[31] zlibbioc_1.32.0 purrr_0.3.4 corpcor_1.6.9 mvtnorm_1.1-1 scales_1.1.1
[36] jpeg_0.1-8.1 getopt_1.20.3 openCyto_1.24.0 flowStats_3.44.0 tibble_3.0.1
[41] generics_0.0.2 ggplot2_3.3.1 ellipsis_0.3.1 flowViz_1.50.0 BiocGenerics_0.32.0
[46] hexbin_1.28.1 mnormt_1.5-6 magrittr_1.5 crayon_1.3.4 IDPmisc_1.1.20
[51] mclust_5.4.6 ks_1.11.7 R.methodsS3_1.8.0 MASS_7.3-51.6 graph_1.64.0
[56] tools_3.6.3 data.table_1.12.8 ncdfFlow_2.32.0 flowClust_3.24.0 lifecycle_0.2.0
[61] matrixStats_0.56.0 stringr_1.4.0 munsell_0.5.0 cluster_2.1.0 compiler_3.6.3
[66] rlang_0.4.6 grid_3.6.3 base64enc_0.1-3 gtable_0.3.0 rrcov_1.5-2
[71] R6_2.4.1 gridExtra_2.3 clue_0.3-57 CytoML_1.12.1 KernSmooth_2.23-17
[76] Rgraphviz_2.30.0 stringi_1.4.6 parallel_3.6.3 Rcpp_1.0.4.6 vctrs_0.3.0
[81] png_0.1-7 DEoptimR_1.0-8 tidyselect_1.1.0
The text was updated successfully, but these errors were encountered: