This repository is used for the whole mouse brain (wmb) snATAC-seq data analysis of Center for Epigenomics of the Mouse Brain Atlas (CEMBA), which is now accepted by Nature 2023.
- 2024-09-25: add explanation in Discusssion for 4D and 4E dissections.
- 2024-10-19: add google drive link for 2.3 million cell meta data.
All the analysis and the h5ad data generated are from SnapATAC2 under <= 2.4.0 There are some break changes later after SnapATAC2 >= 2.5.0
- Our dissection 4D annotated in the meta data, should be 4E; and 4E should be 4D.
- This might be a label issue during experiment record. We are not that sure.
- Check the disccusion for details info: #20 .
- In our repository, we keep everything now unchanged. So if you need 4D region, you should give 4E a look and vice versa.
- Demultiplexed data can be accessed via the NEMO archive (NEMO, RRID:SCR_016152) at https://assets.nemoarchive.org/dat-bej4ymm (the raw directory under Source Data URL in this archive).
- We also uploaded our demultiplexed fastq files and processed files under the GEO accession number GSE246791:
- Processed data is also available on our web portal and can be explored here: http://www.catlas.org.
- A Google drive link for bigwig files and SnapATAC2 files just for backup:
- All the cellmeta information:
- We now have 234 samples and 2.3 million cells in total. So most of the analysis are depend on Snakefile to organize the pipeline and submit them to high-performance cluster (HPC) in order to use hundreds of CPUs at the same time.
- R, Shell and Python (>= 3.10) are mainly used, especially R (>= 4.2).
- Under the directory package, we put lots of common functions there.
- We mainly use SnapATAC2 to analyze the single-nucleus ATAC-seq data
- Comparation between Scrublet and AMULET: https://github.com/yuelaiwang/CEMBA_AMULET_Scrublet
- The deep learning related codes now in the repo: https://github.com/yal054/mba_dl_model
- sa2 is short for SnapATAC2 in this repo.
In total, we have implemented four-round iterative clustering. See details in 01.clustering
We use Allen’s scRNAseq data and their annotations for our data annotation. See details in 02.integration
We use macs2 with multiple stage filtering, especially use SPM >= 5 for filtering peaks. See details in 03.peakcalling
- cembav2env.R: R env to store the metadata during analysis.
Enviorment Description cembav2env meta data of SnapATAC and SnapATAC2 cluSumBySa2 clustering meta data, such as resolution, barcode to L4 Ids, L4 major regions and so on Sa2Integration Integration meta data, like Allen’s data descriptions Sa2PeakCalling Peak calling meta data