NOTE: Please update the package from version 1.0.3 to 1.0.5 to fix a possible object merging error
ArchRtoSignac is an R package to convert an ArchRProject (ArchR) to a Signac SeuratObject (Signac).
ArchR and Signac are both commonly used scATAC-seq analysis packages with comparable sets of features and are currently under development, which means they are likely to change over time. You can choose to use only one of these packages; however, you may want to use both packages for your analysis. For example, we use ArchR to generate a fixed-width peak matrix due to its computational advantage, and we use Signac for reference mapping to assist in cell-type annotation. Here we provide an option to help with the data formatting from an ArchRProject to a Signac SeuratObject: ArchRtoSignac, a wrapper function that allows easier implementation of both pipelines. In addition, conversion to a SeuratObject allows the use of other packages available through SeuratWrappers.
Shi, Zechuan; Das, Sudeshna; Morabito, Samuel; Miyoshi, Emily; Swarup, Vivek. (2022). Protocol for single-nucleus ATAC sequencing and bioinformatic analysis in frozen human brain tissue, STAR Protocols, Volume 3, Issue 3, DOI: https://doi.org/10.1016/j.xpro.2022.101491.
We recommend creating an R conda environment specifically for scATAC-seq analysis to install the required packages. This ensures that software versions required here do not conflict with software required for other projects, and several dependencies for ArchRtoSignac will be automatically installed.
# create new conda environment for R
conda create -n scATAC -c conda-forge r-base r-essentials
# activate conda environment
conda activate scATAC
Next, open up R and install ArchRtoSignac using devtools
.
# install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
# install ArchRtoSignac
devtools::install_github("swaruplabUCI/ArchRtoSignac")
# load ArchRtoSignac
library(ArchRtoSignac)
When installing ArchRtoSignac, the following required dependencies should be automatically installed.
- ArchR, a general-purpose toolkit for single-cell ATAC sequencing analysis.
- Seurat, a general-purpose toolkit for single-cell RNA sequencing analysis.
- Signac, a general-purpose toolkit for single-cell ATAC sequencing analysis.
- devtools, a package for package development in R.
- biovizBase, basic graphic utilities for visualization of genomic data in R.
- stringr, a package for data cleaning and preparation in R.
However, if there are issues with installation, please try the following:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
# install Bioconductor core packages
BiocManager::install()
# install additional packages including ArchR, Signac Seurat and etc:
if (!requireNamespace("biovizBase", quietly = TRUE)) BiocManager::install("biovizBase")
if (!requireNamespace("ArchR", quietly = TRUE)) devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())
if (!requireNamespace("Signac", quietly = TRUE)) install.packages("Signac")
if (!requireNamespace("Seurat", quietly = TRUE)) install.packages("Seurat")
if (!requireNamespace("stringr", quietly = TRUE)) install.packages("stringr")
- STEP 0: Check all required dependencies have been installed and load them automatically
packages <- c("ArchR","Seurat", "Signac","stringr") # required packages
loadinglibrary(packages)
- STEP 1 - Obtain ArchRProject peak matrix for object conversion.
pkm <- getPeakMatrix(proj) # proj is an ArchRProject
- STEP 2 - Extract appropriate Ensembl gene annotation and convert to UCSC style.
library(EnsDb.Hsapiens.v86) # Ensembl database to convert to human hg38. Install what is appropriate for your analysis
annotations <- getAnnotation(reference = EnsDb.Hsapiens.v86, refversion = "hg38") # "UCSC" is the default style to change to but can be changed with argument seqStyle
- STEP 3 - Convert ArchRProject to Signac SeuratObject
- Choosing from the following options based on the format of the fragment files you generated before
STEP 3 Option1: Fragments Files using for fragments_fromcellranger
from 10X Genomics Cellranger ATAC output
Option 1 is designed for the saving format exactly like output from the pipeline run by 10X Genomics Cellranger ATAC
IF you have 10X output but either format or PATH is not standard from direct output of 10X pipeline, please use and click Option 2 in the following link
Click to reveal STEP 3 Option1 code
Please select Yes for fragments_fromcellranger
. Example fragments_fromcellranger = "Yes"
# Option 1a: Set one directory containing the cellranger output for each sample
fragments_dir <- "path_to_cellranger_atac_output" # the directory before "/outs/" for all samples
seurat_atac <- ArchR2Signac(
ArchRProject = proj,
refversion = "hg38",
#samples = samplelist, # list of samples in the ArchRProject (default will use ArchRProject@cellColData$Sample but another list can be provided)
fragments_dir = fragments_dir,
pm = pkm, # peak matrix from getPeakMatrix()
fragments_fromcellranger = "Yes", # fragments_fromcellranger This is an Yes or No selection ("NO" | "N" | "No" or "YES" | "Y" | "Yes")
fragments_file_extension = NULL, # Default - NULL: File_Extension for fragments files (typically they should be '.tsv.gz' or '.fragments.tsv.gz')
annotation = annotations # annotation from getAnnotation()
)
# Option 1b: Set a list of directories containing the cellranger output for each sample
# (this newly added code to take in a list of fragments' path work both for fragments from cellranger and fragments not from cellranger, and when fragments are not from cellranger, please provide fragments_file_extension)
#
# Also PLEASE MAKE SURE the order of the fragment_dirs for samples have the same order as samplelist
# or the order of list from ArchRProject@cellColData$Sample
fragments_dirs <- list(
"/path/to/sample1/cellranger/output",
"/path/to/sample2/cellranger/output",
"/path/to/sample3/cellranger/output"
)
# # Optional: when fragments_fromcellranger = "NO", please set the file extension for the fragments file
# fragments_file_extension <- ".fragments.tsv.gz"
# Call the ArchR2Signac function with the provided arguments
SeuratObject <- ArchR2Signac(
ArchRProject = proj,
refversion = "hg38",
samples = samples,
fragments_dir = fragments_dirs,
pm = pkm,
fragments_fromcellranger = "YES",
annotation = annotations
)
STEP 3 Option2: Fragments Files using for fragments_fromcellranger
from NON Cellranger ATAC output or even from Cellranger ATAC output but don't match the standard output PATH from Cellranger ATAC cellranger-atac count
, ie: SnapATAC tools
Please select No for fragments_fromcellranger
. Example fragments_fromcellranger = "NO"
, Also remember to provide the fragments_file_extension
, for example fragments_fromcellranger = '.tsv.gz'
or fragments_fromcellranger = '.fragments.tsv.gz'
.
Option2a: Provide only one main fragments_dir
Click to reveal STEP 3 Option2a code
#For eample, Fragments files in the folder HemeFragments, which we can check them in terminal
### in Linux ###
tree /ArchR/HemeFragments/
/ArchR/HemeFragments/
├── scATAC_BMMC_R1.fragments.tsv.gz
├── scATAC_BMMC_R1.fragments.tsv.gz.tbi
├── scATAC_CD34_BMMC_R1.fragments.tsv.gz
├── scATAC_CD34_BMMC_R1.fragments.tsv.gz.tbi
├── scATAC_PBMC_R1.fragments.tsv.gz
└── scATAC_PBMC_R1.fragments.tsv.gz.tbi
##################
** Possible issue due to the fragments format if fragments files are not from cellranger actac out: Reported in #Issue3 Please check out: Signac snATAC-seq fragment file Format
** Solution stuart-lab/signac#748
fragments_dir <- "/ArchR/HemeFragments/" # please see the fragments format provided by ArchR examples
#Above is the directory accessing the fragments files.
## NOTE: steps before the the conversion from ArchRProject to Signac SeuratObject.
#BiocManager::install("EnsDb.Hsapiens.v75")
#library(EnsDb.Hsapiens.v75)
#annotations <- getAnnotation(seqStyle = 'UCSC', refversion = 'hg19', reference = EnsDb.Hsapiens.v75)
#pm <- getPeakMatrix(ArchRProject= proj)
# Conversion function
seurat_atac <- ArchR2Signac(
ArchRProject = proj,
# samples = samples, # Provide a list of unique sample
fragments_dir = fragments_dir, # the folder that contains all fragments samples in '.fragments.tsv.gz' or '.tsv.gz'
pm = pm, # geting peak martix
fragments_fromcellranger = "NO",
fragments_file_extension = '.fragments.tsv.gz',
refversion = 'hg19', # write the EnsDb version
annotation = annotations
)
Option2b: Provide only one list of fragments_dirs
Click to reveal STEP 3 Option2b code
## OR providing a fragments list but leave the fragments file extension out (use the 'fragments_file_extension' for the fragments extension)
fragments_dirs <- list(
"/ArchR/HemeFragments/scATAC_BMMC_R1",
"/ArchR/HemeFragments/scATAC_CD34_BMMC_R1",
"/ArchR/HemeFragments/scATAC_PBMC_R1"
)
# Call the ArchR2Signac function with the provided arguments
SeuratObject <- ArchR2Signac(
ArchRProject = proj,
refversion = "hg19",
# samples = samples,
fragments_dir = fragments_dirs,
pm = pm,
fragments_fromcellranger = "NO",
fragments_file_extension = '.fragments.tsv.gz',
annotation = annotations
)
Option2c: Provide only one list fragments_dirs
but make changes to the arguments that supplied for the 'fragments_file_extension'
This option is used for cellranger output fragments.tsv.gz but doesn't match the standard output path style.
ie, your fragments file looking like this:
Click to reveal STEP 3 Option2c code
ls
/ArchR/HemeFragments/scATAC_BMMC_R1/fragments.tsv.gz
/ArchR/HemeFragments/scATAC_CD34_BMMC_R1/fragments.tsv.gz
/ArchR/HemeFragments/scATAC_PBMC_R1/fragments.tsv.gz
providing a fragments list but leave the fragments file extension out (use the 'fragments_file_extension' for the fragments whole name 'fragments.tsv.gz')
fragments_dirs <- list(
"/ArchR/HemeFragments/scATAC_BMMC_R1/", # Alert: need the "/" in the end
"/ArchR/HemeFragments/scATAC_CD34_BMMC_R1/",
"/ArchR/HemeFragments/scATAC_PBMC_R1/"
)
# Call the ArchR2Signac function with the provided arguments
SeuratObject <- ArchR2Signac(
ArchRProject = proj,
refversion = "hg19",
# samples = samples,
fragments_dir = fragments_dirs,
pm = pm,
fragments_fromcellranger = "NO",
fragments_file_extension = 'fragments.tsv.gz', # instead of using fragments_file_extension (.tsv.gz or .fragments.tsv.gz), here just use the whole name fragments.tsv.gz
annotation = annotations
)
- STEP 4 - Transfer ArchRProject gene score matrix to Signac SeuratObject.
gsm <- getGeneScoreMatrix(ArchRProject = proj, SeuratObject = seurat_atac)
seurat_atac[['RNA']] <- CreateAssayObject(counts = gsm)
- STEP 5 - Transfer ArchRProject dimension reduction ("IterativeLSI", "IterativeLSI2" or "Harmony") and UMAP to Signac SeuratObject.
seurat_atac <- addDimRed(
ArchRProject = proj,
SeuratObject = seurat_atac,
addUMAPs = "UMAP",
reducedDims = "IterativeLSI"
) # default is "IterativeLSI"
# OR
#add both 'Harmony' and ‘IterativeLSI’:
seurat_atac <- addTwoDimRed(
ArchRProject = proj,
SeuratObject = seurat_atac,
addUMAPs = "UMAP",
reducedDims1 = "IterativeLSI",
# Please limit your reducedDims to one of the following: IterativeLSI, IterativeLSI2 or Harmony
reducedDims2 = "Harmony" # IterativeLSI2 or Harmony
)
# OR
#add Customized named dimension reduction - from reducedDims and reducedDimsType -- 'Harmony' or 'IterativeLSI':
seurat_atac <- addCustomizeDimRed(
ArchRProject = proj3,
SeuratObject = seurat_atac,
addUMAPs = "UMAP",
reducedDims = 'IterativeLSI',
reducedDimsType = 'IterativeLSI'
)
#[1] "In Progress:"
#[1] "add UMAP From ArchRProject to SeuratObject"
#[1] "In Progress:"
#[1] "add reduction From ArchRProject to SeuratObject"
#[1] "Return SeuratObject"