From 183afd9ed28d4e563ae0a663e892c1f16aae9991 Mon Sep 17 00:00:00 2001 From: YunaBlum <31280664+YunaBlum@users.noreply.github.com> Date: Sat, 30 Nov 2024 22:17:19 +0100 Subject: [PATCH] Update how-do-i-start.md typo --- phase-1_2_3/bundle/how-do-i-start.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/phase-1_2_3/bundle/how-do-i-start.md b/phase-1_2_3/bundle/how-do-i-start.md index 3a7e5ce..5b8739b 100755 --- a/phase-1_2_3/bundle/how-do-i-start.md +++ b/phase-1_2_3/bundle/how-do-i-start.md @@ -24,8 +24,10 @@ Basal-like and classical cells methylation profiles were retrieved from [this pa All references are contained in the file `data/reference_pdac.rds`. This object is a list with the following elements: \- **ref_bulkRNA**: bulk RNAseq pure cell populations + \- **ref_met**: bulk methylation profiles of pure cell populations - \- **ref_scRNA**: single-cell RNAseq reference datasets from 3 differents studies; Peng, Baron and Raghavan, each including the data (counts) and associated metadata, i.e. the sample and the cell type. + + \- **ref_scRNA**: single-cell RNAseq reference datasets from 3 differents studies; Peng, Baron and Raghavan, each including the data (counts) and associated metadata, i.e. the sample and the cell type. Example to load and inspect the reference data: @@ -128,12 +130,12 @@ For the `VIVO` dataset, we do not know the ground truth. However, we have a prox The principle of the `SBN5` pseudo-bulk simulation is based on how bulk samples are sequenced in real life. The global gene expression or DNA methylation in a bulk sample is measured from a multitude of heterogeneous cells. Single-cell technology measures gene expression or DNA methylation in one cell, so an *in silico* mixture of single-cell data of different cell types in known proportions produces a pseudo-bulk sample. -For all further *in silico* simulated datasets, the ground truth is generated based on a Dirichlet distribution with different sets of parameters, chosen to generate proportions close to the *in vitro* ones. The first dataset is a basic simulation with no explicit dependence or correlation introduced between genes and CpG probes. Based on this simple simulation, we produced two other datasets: one with only 4 cell types, and one with 6 cell types, while there are 5 cell types in the reference used for the reference-based deconvolution. +For all further *in silico* simulated datasets, the ground truth is generated based on a Dirichlet distribution with different sets of parameters, chosen to generate proportions close to the *in vitro* ones. The first dataset `SDN5` is a basic simulation with no explicit dependence or correlation introduced between genes and CpG probes. Based on this simple simulation, we produced two other datasets: one with only 4 cell types `SDN4`, and one with 6 cell types `SDN6`, while there are 5 cell types in the reference used for the reference-based deconvolution. The last 3 *in silico* simulations introduce a dependence structure between genes and CpG probes. These dependences are estimated from the *in vitro* dataset by two different approaches: -- EMFA: We estimate a factor model of the conditionnal variance-covariance matrix in the *in vitro* data. The factor model is estimated by an Expectation-Maximisation algorithm (https://doi.org/10.1198/jasa.2009.tm08332) -- Copula: Copulas characterise the type of dependence, such as non-linear or tail dependence, between multiple variables. We estimate the empirical copula of the residuals between *in vitro* bulk samples and *in vitro* references (https://doi.org/10.18637/jss.v021.i04). +- EMFA `SDE5` : We estimate a factor model of the conditionnal variance-covariance matrix in the *in vitro* data. The factor model is estimated by an Expectation-Maximisation algorithm (https://doi.org/10.1198/jasa.2009.tm08332) +- Copula `SDC5`: Copulas characterise the type of dependence, such as non-linear or tail dependence, between multiple variables. We estimate the empirical copula of the residuals between *in vitro* bulk samples and *in vitro* references (https://doi.org/10.18637/jss.v021.i04). The dataset `SDEL` is derived from the EMFA simulation procedure, but with very low proportion for one cell type.