Skip to content

Commit

Permalink
Release 2024-05-29-1000
Browse files Browse the repository at this point in the history
  • Loading branch information
TDCasasent committed May 31, 2024
1 parent 76c02ef commit 93cddd2
Show file tree
Hide file tree
Showing 28 changed files with 342 additions and 119 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ See documentation at https://github.com/MD-Anderson-Bioinformatics/BatchEffectsP

# MBatch and MBatchUtils R Packages

# **Install documentation is in the process of being updated.**

The documentation directort contains several kinds of documentation for MBatch:

* Files that start MBatch_01 are install documentations.
Expand Down Expand Up @@ -93,9 +95,11 @@ Then you can install the MBatch Python Package:

```
conda activate gendev
pip install git+https://github.com/MD-Anderson-Bioinformatics/BatchEffectsPackage.git#egg=mbatch&subdirectory=apps/PyMBatch
pip install "git+https://github.com/MD-Anderson-Bioinformatics/BatchEffectsPackage.git#egg=mbatch&subdirectory=apps/PyMBatch"
```

If needed, you must override the default Python environment of "/BEA/gendev" by setting the environmental variable MBATCH_PYTHON_ENV prior to install. On Linux or OS X, you can create a link from your gendev to /BEA/gendev.

## MBatch R Package

If you are familiar with your OS prerequisites and R package installation, the following quickstart instructions may allow quick installation.
Expand Down
2 changes: 1 addition & 1 deletion apps/MBatch/MBatch.Rproj
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Version: 1.0

RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: No
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
Expand Down
2 changes: 1 addition & 1 deletion apps/MBatch/R/BEA_AssessmentsUmap.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ createBatchEffectsOutput_umap<-function(theMatrix, theDataframeBatchData, theTit
saveCompListDscData(cleanFilePath(theUmapOutputDir, "ALL__CompListDSC.RData"), c("PC1", "PC2", "PC3", "PC4"))
}
logInfo("createBatchEffectsOutput_umap neighbors=", neighbors)
umap_data <- uwot::umap(pca_data$x, n_neighbors = neighbors)
umap_data <- uwot::umap(pca_data$x, n_neighbors = neighbors, seed=theSeed)
# do not recast to dataframe, it changes the column names in a way that breaks UTF-8 characters
#myDF <- data.frame(theDataframeBatchData, stringsAsFactors=FALSE)
samplesIds <- as.vector(unlist(theDataframeBatchData[1]))
Expand Down
6 changes: 2 additions & 4 deletions apps/MBatch/R/BatchEffAssess.R
Original file line number Diff line number Diff line change
Expand Up @@ -1088,9 +1088,7 @@ getTestDapiURL <- function()
value <- Sys.getenv("MBATCH_TEST_DAPI_URL")
if (!isTRUE(file.exists(value)))
{
value <- ""
# uncomment this for next release - currently test would fail since endpoint does not exist
# value <- "https://bioinformatics.mdanderson.org/MQA"
value <- "https://bioinformatics.mdanderson.org/MQA"
}
value
}
Expand Down Expand Up @@ -1228,7 +1226,7 @@ compareTwoDataframes <- function(theCorrected, theCompare)
{
# ignore
}
else if (!(all.equal(theCorrected[myRow, myCol], theCompare[myRow, myCol])))
else if (!(all.equal(theCorrected[myRow, myCol], theCompare[myRow, myCol], tolerance=0.0001)))
{
message("myRow=", myRow)
message("myCol=", myCol)
Expand Down
5 changes: 5 additions & 0 deletions apps/MBatch/R/zzz.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@
{
# call set to trigger loading of Python environment
packageStartupMessage(paste("Loading Conda Environment.", getMBatchVersion(), sep=" "))
condaEnvOverride <- Sys.getenv("MBATCH_PYTHON_ENV")
if ("" != condaEnvOverride)
{
setGlobalMBatchEnv(condaEnvOverride)
}
setGlobalMBatchEnv(getGlobalMBatchEnv())
setGlobalMBatchErrorTest(FALSE)
packageStartupMessage(paste("All sorting in this package requires using a Sys.setlocale(\"LC_COLLATE\",\"C\").", getMBatchVersion(), sep=" "))
Expand Down
2 changes: 2 additions & 0 deletions apps/MBatch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ The documentation directort contains several kinds of documentation for MBatch:

Downloads and details on Standardized Data are available at http://bioinformatics.mdanderson.org/TCGA/databrowser/

If needed, you must override the default Python environment of "/BEA/gendev" by setting the environmental variable MBATCH_PYTHON_ENV prior to install.

See main README.MD on install instructions.

# Seurat RDS files into Standardized Data Format
Expand Down
19 changes: 18 additions & 1 deletion apps/MBatch/tests/DapiQuery_download.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,13 @@
# MD Anderson Cancer Center Bioinformatics at MDA <https://www.mdanderson.org/research/departments-labs-institutes/departments-divisions/bioinformatics-and-computational-biology.html>

require(MBatch)
require(reticulate)

py_config()

testUrl <- getTestDapiURL()
message("SKIPPING TEST UNTIL DEBUGGED")
testUrl <- ""
if (""!=testUrl)
{
outputDir <- getTestOutputDir()
Expand All @@ -36,6 +41,7 @@ if (""!=testUrl)
pyObj$selected_projects <- append(pyObj$selected_projects, "TCGA-LUSC")
pyObj$selected_jobtype <- append(pyObj$selected_jobtype, "Original")
pyObj$selected_data <- append(pyObj$selected_data, "STAR - Counts")
py_config()
updateDapiQuery(pyObj)
# should be 3
length(pyObj$available_datasets)
Expand Down Expand Up @@ -65,7 +71,18 @@ if (""!=testUrl)
zipFilePath <- "/analysis/NGCHM/DATA_2022-12-12/TEST_2022_12_28_1300/All_ngchm.ngchm.html"
downloadNgchmNgchm(pyObj, downloadFile, datasetId, zipFilePath)
downloadFile <- file.path(theOutputDir, "batch_id_ngchm.html")
py_config()
downloadNgchmHtml(pyObj, downloadFile, datasetId, zipFilePath)
return(TRUE)
size <- file.size(downloadFile)
message(size)
return(65237879 == size)
} else
{
message("Nothing done")
TRUE
}
} else
{
message("Nothing done")
TRUE
}
25 changes: 8 additions & 17 deletions apps/MBatch/tests/EBNPlus_TrainAndValidate.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ compareDir <- getTestCompareDir()
theDataFile1=cleanFilePath(inputDir, "brca_rnaseq2_matrix_data.tsv")
theDataFile2=cleanFilePath(inputDir, "brca_agi4502_matrix_data.tsv")
theOutputDir=cleanFilePath(outputDir, "EBNPlus_TrainAndValidateReplicates_Structures3")
theCompareFile=cleanFilePath(compareDir, "EBNPlus_TrainAndValidate.tsv")

theBatchId1="RNASeqV2"
theBatchId2="Agilent4502"
theRandomSeed=314
Expand Down Expand Up @@ -120,24 +122,13 @@ if ((!dir.exists(theDataFile1))&&(!dir.exists(theDataFile2)))
theDataVersion="DATA_2022-09-09-1600",
theTestVersion="TEST_2022-10-10-1300",
thePriorPlotFile="priorplots.PNG")
print("TestSet1")
printMatrix(resultsList$TestSet1)
print("TestSet2")
printMatrix(resultsList$TestSet2)
print("TrainingSet1")
printMatrix(resultsList$TrainingSet1)
print("TrainingSet2")
printMatrix(resultsList$TrainingSet2)
print("TrainingResults")
printMatrix(resultsList$TrainingResults)
print("ValidationSet1")
printMatrix(resultsList$ValidationSet1)
print("ValidationSet2")
printMatrix(resultsList$ValidationSet2)
print("ValidationResults")
printMatrix(resultsList$ValidationResults)
print("CorrectedResults")
printMatrix(resultsList$CorrectedResults)
testMe <- resultsList$ValidationResults
compareMe <- readAsGenericMatrix(theCompareFile)
compared <- compareTwoMatrices(testMe, compareMe)
print("compared")
print(compared)
compared
}

TRUE
4 changes: 2 additions & 2 deletions apps/MBatch/tests/UMAP_Structures.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ compareDir <- getTestCompareDir()
theGeneFile=cleanFilePath(inputDir, "matrix_data-Tumor.tsv")
theBatchFile=cleanFilePath(inputDir, "batches-Tumor.tsv")
theOutputDir=cleanFilePath(outputDir, "UMAP")
theCompareFile=cleanFilePath(compareDir, "UMAP_Data-batc.tsv")
theCompareFile=cleanFilePath(compareDir, "UMAP_Data-umap.tsv")
print(theCompareFile)
theRandomSeed=314
#myRandomSeed <- 314
Expand Down Expand Up @@ -53,7 +53,7 @@ if (!is.null(inputDir))
theDoDscPermsFileFlag=TRUE,
theSeed=314)
umap_tsv <- retval[1]
umap_tsv <- file.path(dirname(umap_tsv), "UMAP_Data-batc.tsv")
umap_tsv <- file.path(dirname(umap_tsv), "UMAP_Data-umap.tsv")
print(umap_tsv)
# compare batch information, since umap file contents is
# not same from run to run, and UMAP package code does not
Expand Down
20 changes: 18 additions & 2 deletions apps/MBatch/tests/Volcano_Structures.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,13 @@ require(MBatch)

inputDir <- getTestInputDir()
outputDir <- getTestOutputDir()
compareDir <- getTestCompareDir()

theGeneFile=cleanFilePath(inputDir, "matrix_data-Tumor.tsv")
theBatchFile=cleanFilePath(inputDir, "batches-Tumor.tsv")
theOutputDir=cleanFilePath(outputDir, "Volcano_Structures")
theCompareFile=cleanFilePath(compareDir, "Volcano-Data-OR_-_University_of_Michigan.json")
theDynamicFile=cleanFilePath(cleanFilePath(cleanFilePath(cleanFilePath(theOutputDir, "TSS"), "DATA_2022-09-09-1600"), "TEST_2022-10-10-1300"), "Volcano-Data-OR_-_University_of_Michigan.json")
theRandomSeed=314
#myRandomSeed <- 314
#myTestSeed <- 42
Expand All @@ -42,13 +45,26 @@ if (!is.null(inputDir))
Volcano_Structures(theData=myData,
theTitle="Test",
theOutputDir=theOutputDir,
theLogFrameFlag=False,
theLogFrameFlag=FALSE,
theBatchTypeAndValuePairsToRemove=NULL,
theBatchTypeAndValuePairsToKeep=NULL,
theDataVersion="DATA_2022-09-09-1600",
theTestVersion="TEST_2022-10-10-1300",
theMaxFeatureCount=50000)
TRUE
# read Volcano-Data-OR_-_University_of_Michigan.json
message("read files to compare")
print(theCompareFile)
staticLines <- readLines(theCompareFile, warn=FALSE)
print(theDynamicFile)
dynamicLines <- readLines(theDynamicFile, warn=FALSE)
same <- FALSE
message("compare")
if (all.equal(staticLines, dynamicLines))
{
message("matched")
same <- TRUE
}
same
} else {
message("No test data. Skip test.")
TRUE
Expand Down
10 changes: 9 additions & 1 deletion apps/MBatchUtils/R/buildArchive.R
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,15 @@ buildSingleArchive <- function(theResultDir, theDataDir, theZipDir)
# :return: full pathname for ZIP file
message("buildSingleArchive - import(mbatch.index.index_api)")
calc <- import("mbatch.index.index_api")
zipFile <- calc$create_index_archive(theResultDir, theDataDir, theZipDir, file.path(theResultDir, "info"), NULL, NULL)
# the_results_dir: str,
# the_data_dir: str,
# the_zip_dir: str,
# the_info_dir: str,
# the_update_only_flag: bool,
# the_new_data: typing.Optional[StandardizedData],
# the_std_list: typing.Optional[List[StandardizedData]])
zipFile <- calc$create_index_archive(theResultDir, theDataDir, theZipDir,
file.path(theResultDir, "info"), FALSE, NULL, NULL)
message("buildSingleArchive - after Python")
zipFile
}
Expand Down
14 changes: 13 additions & 1 deletion apps/PyMBatch/mbatch.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
Metadata-Version: 2.1
Name: mbatch
Version: 1.0
Version: 2.1
Requires-Dist: matplotlib
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scanpy
Requires-Dist: pillow
Requires-Dist: jsonpickle
Requires-Dist: requests
Requires-Dist: xmltodict
Requires-Dist: cryptography
Requires-Dist: urllib3
Requires-Dist: scipy
Requires-Dist: scikit-learn
3 changes: 3 additions & 0 deletions apps/PyMBatch/mbatch.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ mbatch.egg-info/SOURCES.txt
mbatch.egg-info/dependency_links.txt
mbatch.egg-info/requires.txt
mbatch.egg-info/top_level.txt
mbatch/batchstats/__init__.py
mbatch/batchstats/batch_stats.py
mbatch/correct/__init__.py
mbatch/correct/correct.py
mbatch/dapi/__init__.py
Expand Down Expand Up @@ -61,6 +63,7 @@ mbatch/test/test_index.py
mbatch/test/test_job.py
mbatch/test/test_ldapjwt.py
mbatch/test/test_legend.py
mbatch/test/test_pls.py
mbatch/test/test_volcano.py
mbatch/visualindex/__init__.py
mbatch/visualindex/visual_index_base.py
Expand Down
1 change: 1 addition & 0 deletions apps/PyMBatch/mbatch.egg-info/requires.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ xmltodict
cryptography
urllib3
scipy
scikit-learn
Empty file.
31 changes: 31 additions & 0 deletions apps/PyMBatch/mbatch/batchstats/batch_stats.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Copyright (c) 2011-2024 University of Texas MD Anderson Cancer Center
This program is free software: you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation, either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program.
If not, see <https://www.gnu.org/licenses/>.
MD Anderson Cancer Center Bioinformatics on GitHub <https://github.com/MD-Anderson-Bioinformatics>
MD Anderson Cancer Center Bioinformatics at MDA <https://www.mdanderson.org/research/departments-labs-institutes/departments-divisions/bioinformatics-and-computational-biology.html>
@author: Tod Casasent
"""


from scipy.stats import f_oneway
from scipy.stats import alexandergovern


# def calculate_stats() -> [float, float, float, float]:
# """
# calculate F-Value and S Statistic results
# :return: F-Value, F-Value p-value, S-Statistic, S-Statistic p-value
# """
2 changes: 1 addition & 1 deletion apps/PyMBatch/mbatch/gdcapi/converter_mutationmaf.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ def read_and_process_file_dataframe(the_matrix: pandas.DataFrame, the_file_zip:
# "Normal_Depth"
value_dict["Normal_Depth"] = tsv_dict['n_depth']
# "Normal_Reference_Count"
value_dict["Normal_Variant_Count"] = tsv_dict['n_ref_count']
value_dict["Normal_Reference_Count"] = tsv_dict['n_ref_count']
# "Normal_Variant_Count"
value_dict["Normal_Variant_Count"] = tsv_dict['n_alt_count']
# "HGVSp_Short"
Expand Down
2 changes: 2 additions & 0 deletions apps/PyMBatch/mbatch/index/index_repos.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ def extract_results(the_zip_path: str, the_out_dir: str) -> None:
for file_name in zip_file.namelist():
if file_name.startswith('analysis/'):
zip_file.extract(file_name, the_out_dir)
if file_name.startswith('correction/'):
zip_file.extract(file_name, the_out_dir)
if file_name.startswith('info/'):
zip_file.extract(file_name, the_out_dir)

Expand Down
Loading

0 comments on commit 93cddd2

Please sign in to comment.