-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sprint10 CN,DualLab and Heatmap #59
base: dev
Are you sure you want to change the base?
Conversation
if(reduce.so==T){ | ||
smpls=names(object) | ||
object2=sapply(smpls,function(x){ | ||
so=object[[x]] | ||
DefaultAssay(so)='RNA' | ||
so=DietSeurat(object = so, | ||
assays = c('RNA','Protein') | ||
) | ||
return(so) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the suggestion that ChatGPT made to simplify this block of code:
if(reduce.so==T){ | |
smpls=names(object) | |
object2=sapply(smpls,function(x){ | |
so=object[[x]] | |
DefaultAssay(so)='RNA' | |
so=DietSeurat(object = so, | |
assays = c('RNA','Protein') | |
) | |
return(so) | |
}) | |
} | |
if (reduce.so) { | |
object <- lapply(object, function(x) { | |
DefaultAssay(x) <- 'RNA' | |
DietSeurat(x, assays = c('RNA', 'Protein')) | |
}) | |
} |
Explanation provided:
- Removal of sapply and names:
Original: sapply(smpls, ...) uses the names of the object list to loop through each item by name.
Modified: lapply(object, ...) directly loops over the elements in the list without needing to extract the names.
This simplifies the iteration since lapply works directly on the list of objects, removing the need to access them via their names. - No Need to Access by Name:
Original: The code retrieves each Seurat object using object[[x]] where x is the sample name from names(object).
Modified: lapply passes each Seurat object directly as x, so there's no need to access them by name. - Direct Modification of the Object:
Original: Inside sapply, the so object is modified by changing the default assay and then applying DietSeurat.
Modified: The same modification is applied directly to x in the lapply call. There's no need to store the modified object in a separate variable (so), as it's returned directly from the function. - Use of lapply instead of sapply:
Original: sapply returns a simplified vector or matrix when possible, but for operations that work on more complex objects like Seurat objects, it’s better to use lapply, which always returns a list. Since object is a list, lapply maintains the structure without trying to simplify it. - Assignment of Modified Object:
Original: The modified Seurat objects are stored in a new variable object2.
Modified: The changes are assigned back to object, which means the original object list is replaced by the modified Seurat objects.
Equivalent Functionality:
Both versions essentially do the same thing: iterate over the list of Seurat objects, set the DefaultAssay to "RNA", apply DietSeurat to retain only the "RNA" and "Protein" assays, and return the modified list of objects. The lapply version is a more concise and efficient way of achieving the same result.
|
||
### Auto detect number of cells and turn on Conserve memory #### | ||
## Auto detect number of cells and turn on Conserve memory ==== | ||
|
||
## Calculate total number of cells in input SO. | ||
cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2])))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the ChatGPT suggestion here:
cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2])))) | |
cell.count <- sum(unlist((lapply(object, function(x) dim(x)[2])))) | |
conserve.memory <- ifelse(cell.count > cell.count.limit || only.var.genes, TRUE, FALSE) |
@@ -333,11 +359,14 @@ combineNormalize <- function(object, | |||
} | |||
|
|||
|
|||
### Normalize Data #### | |||
## Normalize Data ==== | |||
|
|||
if (SCT.level=="Merged") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChatGPT says (disclaimer: haven't checked it out):
if (SCT.level=="Merged") { | |
if (SCT.level == "Merged") { | |
object.merge <- mergeSamples(object, project.name) | |
object.merge <- SCTransform(object.merge, vars.to.regress = vars.to.regress, conserve.memory = conserve.memory, return.only.var.genes = only.var.genes) | |
} else if (SCT.level == "Sample") { | |
object <- lapply(object, SCTransform, vars.to.regress = vars.to.regress, conserve.memory = conserve.memory, return.only.var.genes = only.var.genes) | |
object.merge <- mergeSamples(object, project.name) | |
integ.features <- SelectIntegrationFeatures(object, nfeatures = nfeatures, mean.cutoff = c(low.cut, high.cut), dispersion.cutoff = c(low.cut.disp, high.cut.disp), normalization.method = "SCT") | |
VariableFeatures(object.merge) <- integ.features | |
} else { | |
stop("SCT method should be either 'Merged' or 'Sample'") | |
} |
This is the explanation:
The two blocks of code achieve the same goal but differ in structure and conciseness. Here's how the shorter version is equivalent to the longer version:
Key Similarities:
SCT Normalization:
Both approaches normalize the data based on the SCT.level, applying SCTransform to either merged or individual objects.
They both perform regression using vars.to.regress, manage memory with conserve.memory, and set return.only.var.genes.
Sample Merging:
The original version manually merges samples by iterating over the objects and performing merge. The shorter version uses a helper function, mergeSamples, which likely handles the merging process inside a concise and reusable function.
Handling SCT.level:
Both versions check if the SCT.level is "Merged" or "Sample" and apply the corresponding workflow.
"Merged": Samples are merged first and then normalized with SCTransform.
"Sample": Each sample is normalized individually and then merged.
Feature Selection:
Both code versions select variable features. In the original, it's done via FindVariableFeatures for merged data and SelectIntegrationFeatures for individual samples. The simplified version achieves the same by using SelectIntegrationFeatures after merging the individually normalized samples.
Differences (and how the simplified version is equivalent):
Function Abstraction:
Long Version: It manually handles merging of samples (dat vector, looping through samples, etc.). It also contains redundant code for both "Merged" and "Sample" paths, repeating similar operations for each condition.
Short Version: Uses a helper function, mergeSamples, to handle merging in a single step, regardless of whether the samples are merged before or after normalization. This reduces redundancy.
Redundant Blocks:
The longer version repeats a lot of code for both the Merged and Sample cases, especially for merging samples, applying PrepSCTFindMarkers, and selecting variable features. The shorter version abstracts these steps, making the code more concise.
For example, in the longer code, after normalizing with SCTransform, there's a repeated block for rescaling and finding variable features, but the shorter version centralizes these into helper functions.
Handling pca.reg.plot and vars.to.regress:
The longer version includes extra logic for handling a non-regression path when pca.reg.plot and vars.to.regress are set. The shorter version omits this, assuming that behavior might be abstracted into another part of the code or deemed unnecessary for simplification.
Why the Shorter Version is Equivalent:
The core workflow (merging samples, applying SCTransform, finding variable features) remains unchanged in both versions.
The shorter version uses helper functions (mergeSamples, SelectIntegrationFeatures, etc.) to avoid repeating code and make the logic clearer and more maintainable.
The behavior of merging samples before or after normalization is still correctly handled by using SCT.level == "Merged" or SCT.level == "Sample".
In summary, the shorter version simplifies the process by abstracting the repetitive parts into functions and reducing redundancy while preserving the key logic and functionality of the original, longer version.
Added unit test for Figure and so reproducibility and bug fixes for CombNorm.R, Dual Labeling and Heatmap related to sprint 10 tasks