Sprint10 CN,DualLab and Heatmap #59

phoman14 · 2024-06-13T18:40:44Z

Added unit test for Figure and so reproducibility and bug fixes for CombNorm.R, Dual Labeling and Heatmap related to sprint 10 tasks

R/Combine_and_Normalize.R

maggiecam · 2024-09-06T21:01:04Z

R/Combine_and_Normalize.R

+  if(reduce.so==T){
+  smpls=names(object)
+  object2=sapply(smpls,function(x){
+    so=object[[x]]
+    DefaultAssay(so)='RNA'
+    so=DietSeurat(object = so,
+                  assays = c('RNA','Protein')
+                  )
+    return(so)
+  })
+  }


This is the suggestion that ChatGPT made to simplify this block of code:

Suggested change

if(reduce.so==T){

smpls=names(object)

object2=sapply(smpls,function(x){

so=object[[x]]

DefaultAssay(so)='RNA'

so=DietSeurat(object = so,

assays = c('RNA','Protein')

)

return(so)

})

}

if (reduce.so) {

object <- lapply(object, function(x) {

DefaultAssay(x) <- 'RNA'

DietSeurat(x, assays = c('RNA', 'Protein'))

})

}

Explanation provided:

Removal of sapply and names:
Original: sapply(smpls, ...) uses the names of the object list to loop through each item by name.
Modified: lapply(object, ...) directly loops over the elements in the list without needing to extract the names.
This simplifies the iteration since lapply works directly on the list of objects, removing the need to access them via their names.

No Need to Access by Name:
Original: The code retrieves each Seurat object using object[[x]] where x is the sample name from names(object).
Modified: lapply passes each Seurat object directly as x, so there's no need to access them by name.

Direct Modification of the Object:
Original: Inside sapply, the so object is modified by changing the default assay and then applying DietSeurat.
Modified: The same modification is applied directly to x in the lapply call. There's no need to store the modified object in a separate variable (so), as it's returned directly from the function.

Use of lapply instead of sapply:
Original: sapply returns a simplified vector or matrix when possible, but for operations that work on more complex objects like Seurat objects, it’s better to use lapply, which always returns a list. Since object is a list, lapply maintains the structure without trying to simplify it.

Assignment of Modified Object:
Original: The modified Seurat objects are stored in a new variable object2.
Modified: The changes are assigned back to object, which means the original object list is replaced by the modified Seurat objects.
Equivalent Functionality:
Both versions essentially do the same thing: iterate over the list of Seurat objects, set the DefaultAssay to "RNA", apply DietSeurat to retain only the "RNA" and "Protein" assays, and return the modified list of objects. The lapply version is a more concise and efficient way of achieving the same result.

maggiecam · 2024-09-06T21:03:01Z

R/Combine_and_Normalize.R


-  ### Auto detect number of cells and turn on Conserve memory ####
+  ## Auto detect number of cells and turn on Conserve memory ====

  ## Calculate total number of cells in input SO.
  cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2]))))


This is the ChatGPT suggestion here:

Suggested change

cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2]))))

cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2]))))

conserve.memory <- ifelse(cell.count > cell.count.limit || only.var.genes, TRUE, FALSE)

maggiecam · 2024-09-06T21:07:29Z

R/Combine_and_Normalize.R

@@ -333,11 +359,14 @@ combineNormalize <- function(object,
  }


-  ### Normalize Data ####
+  ## Normalize Data ====

  if (SCT.level=="Merged") {


ChatGPT says (disclaimer: haven't checked it out):

Suggested change

if (SCT.level=="Merged") {

if (SCT.level == "Merged") {

object.merge <- mergeSamples(object, project.name)

object.merge <- SCTransform(object.merge, vars.to.regress = vars.to.regress, conserve.memory = conserve.memory, return.only.var.genes = only.var.genes)

} else if (SCT.level == "Sample") {

object <- lapply(object, SCTransform, vars.to.regress = vars.to.regress, conserve.memory = conserve.memory, return.only.var.genes = only.var.genes)

object.merge <- mergeSamples(object, project.name)

integ.features <- SelectIntegrationFeatures(object, nfeatures = nfeatures, mean.cutoff = c(low.cut, high.cut), dispersion.cutoff = c(low.cut.disp, high.cut.disp), normalization.method = "SCT")

VariableFeatures(object.merge) <- integ.features

} else {

stop("SCT method should be either 'Merged' or 'Sample'")

}

This is the explanation:
The two blocks of code achieve the same goal but differ in structure and conciseness. Here's how the shorter version is equivalent to the longer version:

Key Similarities:
SCT Normalization:

Both approaches normalize the data based on the SCT.level, applying SCTransform to either merged or individual objects.
They both perform regression using vars.to.regress, manage memory with conserve.memory, and set return.only.var.genes.
Sample Merging:

The original version manually merges samples by iterating over the objects and performing merge. The shorter version uses a helper function, mergeSamples, which likely handles the merging process inside a concise and reusable function.
Handling SCT.level:

Both versions check if the SCT.level is "Merged" or "Sample" and apply the corresponding workflow.
"Merged": Samples are merged first and then normalized with SCTransform.
"Sample": Each sample is normalized individually and then merged.
Feature Selection:

Both code versions select variable features. In the original, it's done via FindVariableFeatures for merged data and SelectIntegrationFeatures for individual samples. The simplified version achieves the same by using SelectIntegrationFeatures after merging the individually normalized samples.
Differences (and how the simplified version is equivalent):
Function Abstraction:

Long Version: It manually handles merging of samples (dat vector, looping through samples, etc.). It also contains redundant code for both "Merged" and "Sample" paths, repeating similar operations for each condition.
Short Version: Uses a helper function, mergeSamples, to handle merging in a single step, regardless of whether the samples are merged before or after normalization. This reduces redundancy.
Redundant Blocks:

The longer version repeats a lot of code for both the Merged and Sample cases, especially for merging samples, applying PrepSCTFindMarkers, and selecting variable features. The shorter version abstracts these steps, making the code more concise.
For example, in the longer code, after normalizing with SCTransform, there's a repeated block for rescaling and finding variable features, but the shorter version centralizes these into helper functions.
Handling pca.reg.plot and vars.to.regress:

The longer version includes extra logic for handling a non-regression path when pca.reg.plot and vars.to.regress are set. The shorter version omits this, assuming that behavior might be abstracted into another part of the code or deemed unnecessary for simplification.
Why the Shorter Version is Equivalent:
The core workflow (merging samples, applying SCTransform, finding variable features) remains unchanged in both versions.
The shorter version uses helper functions (mergeSamples, SelectIntegrationFeatures, etc.) to avoid repeating code and make the logic clearer and more maintainable.
The behavior of merging samples before or after normalization is still correctly handled by using SCT.level == "Merged" or SCT.level == "Sample".
In summary, the shorter version simplifies the process by abstracting the repetitive parts into functions and reducing redundancy while preserving the key logic and functionality of the original, longer version.

phoman14 added 9 commits September 18, 2023 10:31

Added unit test for Figure and so reproducibility

2041d05

Update Code to reflect changes in Template V115

e5b88f1

HeatmapDensity separate fig and remove density.heatmap param

6829ea1

Sprint 12 Changes to Variable Names

ef2a9ff

update CITEseq processing and orgianization

87e00b3

rename and update Testdata for CN

5f2f58a

Changes pulled from github repo

dc411e4

update Tests to match updated R scripts

c6ee7e4

RDS file for Chariou+NSCLCmulti CombNorm

706ccc2

phoman14 requested review from maggiecam and lobanovav August 16, 2024 13:17

maggiecam reviewed Sep 6, 2024

View reviewed changes

R/Combine_and_Normalize.R Outdated Show resolved Hide resolved

changed pca method default to NULL

05ba42c

maggiecam reviewed Sep 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint10 CN,DualLab and Heatmap #59

Sprint10 CN,DualLab and Heatmap #59

phoman14 commented Jun 13, 2024

maggiecam Sep 6, 2024 •

edited

Loading

maggiecam Sep 6, 2024

maggiecam Sep 6, 2024 •

edited

Loading

	cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2]))))
	cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2]))))
	conserve.memory <- ifelse(cell.count > cell.count.limit \|\| only.var.genes, TRUE, FALSE)

-  if (SCT.level=="Merged") {
+    if (SCT.level == "Merged") {
+    object.merge <- mergeSamples(object, project.name)
+    object.merge <- SCTransform(object.merge, vars.to.regress = vars.to.regress, conserve.memory = conserve.memory, return.only.var.genes = only.var.genes)
+  } else if (SCT.level == "Sample") {
+    object <- lapply(object, SCTransform, vars.to.regress = vars.to.regress, conserve.memory = conserve.memory, return.only.var.genes = only.var.genes)
+    object.merge <- mergeSamples(object, project.name)
+    integ.features <- SelectIntegrationFeatures(object, nfeatures = nfeatures, mean.cutoff = c(low.cut, high.cut), dispersion.cutoff = c(low.cut.disp, high.cut.disp), normalization.method = "SCT")
+    VariableFeatures(object.merge) <- integ.features
+  } else {
+    stop("SCT method should be either 'Merged' or 'Sample'")
+  }

Sprint10 CN,DualLab and Heatmap #59

Are you sure you want to change the base?

Sprint10 CN,DualLab and Heatmap #59

Conversation

phoman14 commented Jun 13, 2024

maggiecam Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

maggiecam Sep 6, 2024

Choose a reason for hiding this comment

maggiecam Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

maggiecam Sep 6, 2024 •

edited

Loading

maggiecam Sep 6, 2024 •

edited

Loading