Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color scale for UMAP/tSNE plots #4

Open
rpolicastro opened this issue May 9, 2019 · 4 comments
Open

Color scale for UMAP/tSNE plots #4

rpolicastro opened this issue May 9, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@rpolicastro
Copy link

I was hoping that we could be provided with more control over the color scales in the UMAP/tSNE plots.

First, being able to display the scales after log transformation. Second, to set max and min values for the color scale so that certain cells in the highest range of values doesn't drown out the color for most of the other cells.

@romanhaa romanhaa added the enhancement New feature or request label May 9, 2019
@romanhaa
Copy link
Owner

romanhaa commented May 9, 2019

What do you mean with displaying the scales after log transformation? The "size factor" of each cell? Adjusting the range of values for color scale is an interesting idea; I might implement in a future release.

@rpolicastro
Copy link
Author

rpolicastro commented May 9, 2019

If I'm reading the code correctly, it looks like for Seurat v3 you are grabbing the 'log normalized' expression values from @assays$RNA@data. These values are actually the raw expression values I think, so when you display them on the gene expression tab the color scales presumably have a huge range because of this making it hard to discern.

With Seurat v3 there are a few assays that could contain normalized, transformed, and scaled expression values:

  1. The old normalization method data and scale.data slots.
  2. The new normalization method (sctransform) data slot.
  3. The integrated analysis data and scale.data slots.

It might be best to try and pull from the scale.data slot first from the active assay (as defined by @active.assay). If there is nothing in the scale.data slot, then maybe try the data slot.

@romanhaa
Copy link
Owner

Hmm, in my case it looks like the raw counts are in @counts and the log-normalized in @data. Check the @x respectively.

Formal class 'Assay' [package "Seurat"] with 7 slots
  ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:24806824] 0 68 75 87 88 346 535 698 782 803 ...
  .. .. ..@ p       : int [1:11770] 0 1087 5287 7123 9339 10954 12753 14717 16674 18367 ...
  .. .. ..@ Dim     : int [1:2] 17516 11769
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : chr [1:17516] "A1BG" "A1BG-AS1" "A2M" "A2M-AS1" ...
  .. .. .. ..$ : chr [1:11769] "AAACCCAAGCGCCCAT-1" "AAACCCAAGGTTCCGC-1" "AAACCCACAGAGTTGG-1" "AAACCCACAGGTATGG-1" ...
  .. .. ..@ x       : num [1:24806824] 1 2 1 1 2 1 1 1 1 1 ...
  .. .. ..@ factors : list()
  ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:24806824] 0 68 75 87 88 346 535 698 782 803 ...
  .. .. ..@ p       : int [1:11770] 0 1087 5287 7123 9339 10954 12753 14717 16674 18367 ...
  .. .. ..@ Dim     : int [1:2] 17516 11769
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : chr [1:17516] "A1BG" "A1BG-AS1" "A2M" "A2M-AS1" ...
  .. .. .. ..$ : chr [1:11769] "AAACCCAAGCGCCCAT-1" "AAACCCAAGGTTCCGC-1" "AAACCCACAGAGTTGG-1" "AAACCCACAGGTATGG-1" ...
  .. .. ..@ x       : num [1:24806824] 1.71 2.31 1.71 1.71 2.31 ...
  .. .. ..@ factors : list()
  ..@ scale.data   : num [1:2000, 1:11769] -0.478 -0.133 -0.134 -0.133 3.72 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2000] "A2M-AS1" "ABCA1" "ABCC3" "ABCC4" ...
  .. .. ..$ : chr [1:11769] "AAACCCAAGCGCCCAT-1" "AAACCCAAGGTTCCGC-1" "AAACCCACAGAGTTGG-1" "AAACCCACAGGTATGG-1" ...
  ..@ key          : chr "rna_"
  ..@ var.features : chr [1:2000] "PPBP" "PF4" "JCHAIN" "PTGDS" ...
  ..@ meta.features:'data.frame':       17516 obs. of  4 variables:
  .. ..$ mean                 : num [1:17516] 0.24089 0.02889 0.00994 0.17249 0.00187 ...
  .. ..$ variance             : num [1:17516] 0.31731 0.03316 0.0112 0.28143 0.00238 ...
  .. ..$ variance.expected    : num [1:17516] 0.31383 0.03471 0.01181 0.21325 0.00203 ...
  .. ..$ variance.standardized: num [1:17516] 1.011 0.955 0.948 1.32 1.168 ...
  ..@ misc         : NULL

However, it could be an option to allow specifying the assay (and maybe even the slot) to pull the data from. If I remember correctly though, sctransform won't return values for all genes, just the top X that you specify in the command, right?

For now, an easy option for you would be to manually overwrite the data slot that holds the expression table inside the .crb file. The number of columns in the table must match the number of cells in the meta data.

cerebro_object <- readRDS("<crb_file>")
cerebro_object$expression <- "<your_custom_expression_table>"
saveRDS(cerebro_object, "<new_crb_file>")

Obviously, the larger and the less sparse the data set, the heavier the workload for Cerebro.

@rpolicastro
Copy link
Author

rpolicastro commented May 10, 2019

Thanks for the temporary fix, that should help for now.

In case you were curious, here is the structure of my seurat object, after following their sctransform and data integration workflows.

> str(seurat@assays)
List of 3
 $ RNA       :Formal class 'Assay' [package "Seurat"] with 7 slots
  .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:150980128] 0 6 11 12 13 19 22 23 24 26 ...
  .. .. .. ..@ p       : int [1:35928] 0 4639 8525 13174 17503 20436 24840 30329 34454 38626 ...
  .. .. .. ..@ Dim     : int [1:2] 18274 35927
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:18274] "AL627309.1" "AL669831.5" "LINC00115" "FAM41C" ...
  .. .. .. .. ..$ : chr [1:35927] "AAACCCAAGAAATGGG_1" "AAACCCAAGCATCGAG_1" "AAACCCACAAGAAACT_1" "AAACCCACAGCTCGGT_1" ...
  .. .. .. ..@ x       : num [1:150980128] 1 2 1 45 1 1 1 1 1 3 ...
  .. .. .. ..@ factors : list()
  .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:150980128] 0 6 11 12 13 19 22 23 24 26 ...
  .. .. .. ..@ p       : int [1:35928] 0 4639 8525 13174 17503 20436 24840 30329 34454 38626 ...
  .. .. .. ..@ Dim     : int [1:2] 18274 35927
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:18274] "AL627309.1" "AL669831.5" "LINC00115" "FAM41C" ...
  .. .. .. .. ..$ : chr [1:35927] "AAACCCAAGAAATGGG_1" "AAACCCAAGCATCGAG_1" "AAACCCACAAGAAACT_1" "AAACCCACAGCTCGGT_1" ...
  .. .. .. ..@ x       : num [1:150980128] 1 2 1 45 1 1 1 1 1 3 ...
  .. .. .. ..@ factors : list()
  .. ..@ scale.data   : num[0 , 0 ] 
  .. ..@ key          : chr "rna_"
  .. ..@ var.features : logi(0) 
  .. ..@ meta.features:'data.frame':	18274 obs. of  0 variables
  .. ..@ misc         : NULL
 $ SCT       :Formal class 'Assay' [package "Seurat"] with 7 slots
  .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:148404440] 0 6 11 12 13 19 22 23 24 26 ...
  .. .. .. ..@ p       : int [1:35928] 0 4639 8525 13174 17503 20915 25319 30163 34288 38460 ...
  .. .. .. ..@ Dim     : int [1:2] 18272 35927
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:18272] "AL627309.1" "AL669831.5" "LINC00115" "FAM41C" ...
  .. .. .. .. ..$ : chr [1:35927] "AAACCCAAGAAATGGG_1" "AAACCCAAGCATCGAG_1" "AAACCCACAAGAAACT_1" "AAACCCACAGCTCGGT_1" ...
  .. .. .. ..@ x       : num [1:148404440] 1 2 1 43 1 1 1 1 1 3 ...
  .. .. .. ..@ factors : list()
  .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:148404440] 0 6 11 12 13 19 22 23 24 26 ...
  .. .. .. ..@ p       : int [1:35928] 0 4639 8525 13174 17503 20915 25319 30163 34288 38460 ...
  .. .. .. ..@ Dim     : int [1:2] 18272 35927
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:18272] "AL627309.1" "AL669831.5" "LINC00115" "FAM41C" ...
  .. .. .. .. ..$ : chr [1:35927] "AAACCCAAGAAATGGG_1" "AAACCCAAGCATCGAG_1" "AAACCCACAAGAAACT_1" "AAACCCACAGCTCGGT_1" ...
  .. .. .. ..@ x       : num [1:148404440] 0.693 1.099 0.693 3.784 0.693 ...
  .. .. .. ..@ factors : list()
  .. ..@ scale.data   : num[0 , 0 ] 
  .. ..@ key          : chr "sct_"
  .. ..@ var.features : logi(0) 
  .. ..@ meta.features:'data.frame':	18272 obs. of  0 variables
  .. ..@ misc         : NULL
 $ integrated:Formal class 'Assay' [package "Seurat"] with 7 slots
  .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int(0) 
  .. .. .. ..@ p       : int 0
  .. .. .. ..@ Dim     : int [1:2] 0 0
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : NULL
  .. .. .. ..@ x       : num(0) 
  .. .. .. ..@ factors : list()
  .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:56589377] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. .. ..@ p       : int [1:35928] 0 1995 3991 5988 7985 9985 11977 13967 15965 17961 ...
  .. .. .. ..@ Dim     : int [1:2] 2000 35927
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:2000] "TFF3" "HES6" "REG4" "KRT17" ...
  .. .. .. .. ..$ : chr [1:35927] "AAACCCAAGAAATGGG_1" "AAACCCAAGCATCGAG_1" "AAACCCACAAGAAACT_1" "AAACCCACAGCTCGGT_1" ...
  .. .. .. ..@ x       : num [1:56589377] -0.3378 0.474 -0.0536 0.7949 1.1046 ...
  .. .. .. ..@ factors : list()
  .. ..@ scale.data   : num [1:2000, 1:35927] -0.864 0.436 -0.433 0.568 -1.266 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2000] "TFF3" "HES6" "REG4" "KRT17" ...
  .. .. .. ..$ : chr [1:35927] "AAACCCAAGAAATGGG_1" "AAACCCAAGCATCGAG_1" "AAACCCACAAGAAACT_1" "AAACCCACAGCTCGGT_1" ...
  .. ..@ key          : chr "integrated_"
  .. ..@ var.features : chr [1:2000] "TFF3" "HES6" "REG4" "KRT17" ...
  .. ..@ meta.features:'data.frame':	2000 obs. of  0 variables
  .. ..@ misc         : NULL

For the 'RNA' assay, both @counts and @data appear to be the raw counts. In the sctransform assay 'SCT', @counts is the raw data, and @data are the normalized and log transformed values. After integrating the data sets into the 'integrated' assay, the @counts slot is now empty, but you have the further normalized data in @data, and the scaled data in @scale.data.

I believe sctransform returns the normalized and log transformed values for all genes. It seems so at least from the number of genes in the seurat object. I think it only uses 2-3k genes after either the integration or scaling step after integration. You can however have it consider all genes, it just takes a long time to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants