cREscENDO

cREscENDO is a deep learning-based computational tool for determining cRE regions and their activities at the single cell level from single sample 10X Single Cell Multiome ATAC + Gene Expression data.

Installation should be performed in the following manner

## We recommend that you run it in a virtual environment. For example, use the following command.
## conda create --name crescendo_env python=3.8
## conda activate crescendo_env
git clone https://github.com/okadalabipr/cREscENDO
cd cREscENDO
pip install -r requirements.txt

to install the necessary files. In addition, please install bedtools (https://bedtools.readthedocs.io/en/latest/content/installation.html) , Meme suite (https://meme-suite.org/meme/doc/install.html?man_type=web) , ucsc-liftover (https://genome.ucsc.edu/cgi-bin/hgLiftOver) ,sync_batchnorm (https://github.com/vacancy/Synchronized-BatchNorm-PyTorch), please refer to their respective websites for installation.
("sync_batchnorm" should be placed inside the “script” directory.)

conda install bioconda::bedtools
conda install bioconda::meme
conda install -c bioconda -y ucsc-liftover

cd ..
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -r Synchronized-BatchNorm-PyTorch/sync_batchnorm cREscENDO/script/

How to run

In order to run this tool, you must provide a TSS list in bed file format. hg38's TSS list is included in this repository.
(TSS list should be placed inside the working directory.)

To execute, run the following command

bash cREscENDO_process_1.sh [path/to/working/directory] [path/to/10X_multiome_h5.file] [path/to/ref/genome/fasta]

Perform the learning. As with other deep learning tools, cREscENDO should be trained using a GPU.
If cross-validation is required, execute the following commands

bash cREscENDO_process_2.sh [path/to/working/directory]
bash cREscENDO_process_3.sh [path/to/working/directory]

If cross-validation is not required, execute the following commands

bash cREscENDO_process_2_single.sh [path/to/working/directory] 0
bash cREscENDO_process_3_single.sh [path/to/working/directory] 0

Cross-validation requires a lot of calculation time. If you want to divide the job or run the calculation in parallel due to restrictions on the execution environment, you can do it as follows.

##Please execute the following commands in separate jobs.
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 0
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 1
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 2
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 3
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 4
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 5
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 6
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 7
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 8
bash cREscENDO_process_2_10foldsplit.sh [path/to/working/directory] 9

##Finally, run the following command to aggregate the results of cross-validation.
bash cREscENDO_process_3.sh [path/to/working/directory]

Output

The single-cell level cRE activity is provided in a file named "Deeplift_full_ver2_all.npy"
This is a 3-dimensional matrix consisting of [genes, cells, peaks]

The matrix showing the maximum activity values for each cRE candidate region is saved in “allgrad_ssep_max.npy”.
This matrix is a two-dimensional matrix of [genes,peaks]

The region with the top 60% activity of the matrix stored in “allgrad_ssep_max.npy” is defined as cRE.
Which region is used for downstream analysis as cRE is stored in “promenhatag.npy”.
This matrix is a two-dimensional matrix of [genes,peaks]

This matrix is annotated by the correlation between the activity value and the ATAC-seq count and cRE activity.
Activity value is in the top 15% & the correlation between the ATAC-seq count and cRE activity is positive: 4
Activity value is in the top 60% (from 15% to 60%) & the correlation between the ATAC-seq count and cRE activity is positive: 2
Promoter of the target gene: 3
(For peaks with a negative correlation between ATAC-seq counts and cRE activity, we do not recommend using them for downstream analysis.)

As for the peak dimension, the target promoter is always entered at index 0.
Then, starting from index 1, the regions with the smallest genome coordinates are stored in order.

For the correspondence between the gene index and the gene name, please refer to the file named "pair_300000.csv" (Gene indexes start with 0)
The file “pair_300000.csv” also contains an index of which ATAC-seq peak is associated with each gene.
The file “pair_promoter.csv” contains the index of the target promoter of each gene.
The index of the ATAC-seq peak corresponds to the index (starting from 0) of the bed file ("peaks.bed").

In addition, the code used in the paper is stored in this github repository.

for_motif

Code for motif analysis

pbmc

Code for Fig.2 and Fig.3A

glioma

Code for Fig.3B-J

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cREscENDO

How to run

Output

for_motif

pbmc

glioma

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
for_motif		for_motif
glioma		glioma
pbmc		pbmc
script		script
LICENSE.txt		LICENSE.txt
README.md		README.md
TSS_list.txt		TSS_list.txt
cREscENDO_process_1.sh		cREscENDO_process_1.sh
cREscENDO_process_1_glioma.sh		cREscENDO_process_1_glioma.sh
cREscENDO_process_2.sh		cREscENDO_process_2.sh
cREscENDO_process_2_10foldsplit.sh		cREscENDO_process_2_10foldsplit.sh
cREscENDO_process_2_single.sh		cREscENDO_process_2_single.sh
cREscENDO_process_3.sh		cREscENDO_process_3.sh
cREscENDO_process_3_single.sh		cREscENDO_process_3_single.sh
cREscENDO_process_celltypemotif.sh		cREscENDO_process_celltypemotif.sh
cREscENDO_process_celltypemotif_integratedgradient.sh		cREscENDO_process_celltypemotif_integratedgradient.sh
requirements.txt		requirements.txt

License

okadalabipr/cREscENDO

Folders and files

Latest commit

History

Repository files navigation

cREscENDO

How to run

Output

for_motif

pbmc

glioma

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages