-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adds MultiAssayExperiment data generation
- Loading branch information
1 parent
f08ed30
commit 8756ec1
Showing
12 changed files
with
237 additions
and
466 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,4 +6,4 @@ vignettes/MANIFEST\.txt | |
vignettes/Human_genes__GRCh38_p10_\.rda | ||
backup/ | ||
GDCdata | ||
^.*\.csv | ||
^.*\.csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
Package: skcm.data | ||
Title: Gene expression and clinical data from Melanoma from TCGA. | ||
Version: 2017.11.20 | ||
Version: 2018.06.20 | ||
Authors@R: person("André", "Veríssimo", email = "[email protected]", role = c("aut", "cre")) | ||
Description: Contains the datasets for SKCM (Melanoma) with gene expression | ||
and clinical data. All was extracted from TCGA. | ||
|
@@ -13,6 +13,9 @@ Suggests: | |
futile.logger, | ||
TCGAbiolinks, | ||
knitr, | ||
rmarkdown | ||
rmarkdown, | ||
Biobase, | ||
SingleCellExperiment, | ||
MultiAssayExperiment | ||
RoxygenNote: 6.0.1 | ||
VignetteBuilder: knitr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
#' Create MultiAssayExperiment object from data | ||
#' | ||
#' @param clinical use custom clinical (that can be pre-processed) | ||
#' | ||
#' @return a MultiAssayExperiment object | ||
#' @export | ||
#' | ||
#' @examples | ||
#' assay <- build.assay() | ||
#' assay[['RNASeq']] | ||
#' assar$vital_status | ||
build.assay <- function(clinical.custom = NULL, | ||
gdc.custom = NULL, | ||
mutation.custom = NULL, | ||
rnaseq.custom = NULL) { | ||
# get clinical data | ||
if (is.null(clinical.custom)) { | ||
data(clinical) | ||
clin <- clinical$all | ||
} else { | ||
clin <- clinical.custom | ||
} | ||
|
||
futile.logger::flog.info('Loading \'Biospecimen\' data...') | ||
if (is.null(gdc.custom)) { | ||
data(gdc) | ||
gdc.custom <- gdc | ||
} | ||
|
||
# get all RNASeq data | ||
futile.logger::flog.info('Joining \'RNASeq\' data...') | ||
if (is.null(rnaseq.custom)) { | ||
rnaseq.custom <- joinRNASeqData() | ||
} | ||
|
||
futile.logger::flog.info('Loading \'Mutation\' data...') | ||
if (is.null(mutation.custom)) { | ||
data(mutation) | ||
mutation.custom <- mutation$count | ||
} | ||
|
||
# | ||
# Expression data | ||
|
||
# map expression data with clinical | ||
es.map <- data.frame(master = strtrim(colnames(rnaseq.custom), 12), | ||
assay = colnames(rnaseq.custom), | ||
stringsAsFactors = FALSE) | ||
|
||
# filter only valid date.. i.e expression that have clinical data | ||
valid.ix <- es.map$master %in% clin$bcr_patient_barcode | ||
valid.dat <- rnaseq.custom[, valid.ix] | ||
|
||
sample.barcode <- strtrim(colnames(valid.dat), 16) | ||
valid.codes <- sample.barcode[sample.barcode %in% gdc$bio.sample$bcr_sample_barcode] | ||
|
||
temp.df <- Biobase::AnnotatedDataFrame(gdc$bio.sample[valid.codes,]) | ||
rownames(temp.df) <- colnames(valid.dat) | ||
|
||
# build expression set | ||
es <- Biobase::ExpressionSet(assayData = valid.dat, phenoData = temp.df) | ||
|
||
# | ||
# Mutation data | ||
mutation.colnames <- colnames(mutation.custom) | ||
valid.ix <- colnames(mutation.custom) %in% clin$bcr_patient_barcode | ||
|
||
mut.map <- data.frame(master = mutation.colnames[valid.ix], assay = mutation.colnames[valid.ix]) | ||
|
||
mut <- SingleCellExperiment::SingleCellExperiment(assays = list(counts = mutation.custom)) | ||
|
||
# | ||
# Setup to create MultiAssayExperiment object | ||
|
||
futile.logger::flog.info('Building Assay...') | ||
listmap <- list(es.map, mut.map) | ||
names(listmap) <- c("RNASeq", "Mutation") | ||
|
||
dfmap <- MultiAssayExperiment::listToMap(listmap) | ||
objlist <- list("RNASeq" = es, "Mutation" = mut) | ||
my.assay <- MultiAssayExperiment::MultiAssayExperiment(objlist, clin, dfmap) | ||
|
||
return(my.assay) | ||
} |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
TCGA.DATA R Package | ||
================ | ||
|
||
- [Package information](#package-information) | ||
- [How to use the dataset](#how-to-use-the-dataset) | ||
- [How to build own data package](#how-to-build-own-data-package) | ||
- [Ackowledgements](#ackowledgements) | ||
|
||
This R Package allows to retrieve Gene Expression, Mutation and clinical data from [TCGA database](http://gdc-portal.nci.nih.gov/) (The Cancer Genome Atlas). It retrieves a single type of cancer at a time. | ||
|
||
We publish diferent package in the [releases page](https://github.com/averissimo/tcga.data/releases) that allow to quickly use the datasets. | ||
|
||
The genome expression datasets are already in a matrix format ready to be used. The data is in FPKM (Fragments Per Kilobase Million) format. Any additional normalization to use in models must be performed | ||
|
||
Package information | ||
------------------- | ||
|
||
### How to use the dataset | ||
|
||
1. Install `brca.data` by using `devtools` package. (`brca.data`, `prad.data` or `skcm.data`) | ||
|
||
2. Load the library | ||
|
||
3. Load the required datasets (one or more of the following) | ||
- `clinical` | ||
- `fpkm.per.tissue` | ||
- `fpkm.per.tissue.barcode` | ||
- `mutation` | ||
- `gdc` | ||
|
||
#### Example for BRCA package | ||
|
||
``` r | ||
# install the devtooks library | ||
install.packages('devtools') | ||
# The library can also be loaded and use the function install_git without 'devtools::' prefix | ||
devtools::install_url('https://github.com/averissimo/tcga.data/releases/download/2016.12.15-brca/brca.data_1.0.tar.gz') | ||
# | ||
# Load the brca.data package | ||
library(brca.data) | ||
# start using the data, for example the tissue data | ||
data(fpkm.per.tissue) | ||
# tissue is now in the enviromnet and will be loaded on the first | ||
# time it is used. For example: | ||
names(fpkm.per.tissue) | ||
``` | ||
|
||
How to build own data package | ||
----------------------------- | ||
|
||
1. Open vignettes/build\_data.Rmd | ||
2. Change in the header of the Rmd *(beginning of the document)* the project param to the target TCGA project | ||
3. Open DESCRITION and change the name of the package to the desired name | ||
|
||
- we use a convention of \#\#\#\#.data where \#\#\#\# is the tcga project name in lowercase | ||
|
||
1. Run the vignettes/build\_data.Rmd to build the cache of the data | ||
2. Run `devtools::document()` to create documentation | ||
3. Run `devtools::build()` to build the actual package | ||
|
||
Ackowledgements | ||
--------------- | ||
|
||
This package was developed primarily by *[André Veríssimo](http://web.tecnico.ulisboa.pt/andre.verissimo/)* with support from *Marta Lopes* and *[Susana Vinga](http://web.tecnico.ulisboa.pt/susanavinga/)* | ||
|
||
This work was supported by: | ||
|
||
- [FCT](www.fct.pt), through IDMEC, under LAETA, projects *(UID/EMS/50022/2013)*; | ||
- Susana Vinga acknowledges support by program Investigador FCT *(IF/00653/2012)* from [FCT](www.fct.pt), co-funded by the European Social Fund *(ESF)* through the Operational Program Human Potential *(POPH)*; | ||
- André Veríssimo acknowledges support from [FCT](www.fct.pt) *(SFRH/BD/97415/2013)*. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters