NDID

Introduction

Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) sequencing is a technology to study genome-wide long-range chromatin interactions bound by protein factors. NDID is a statistical technique for the joint normalization and differential chromatin interactions detection from ChIA-PET experiments.

The NDID requires the following dependencies:

R (≥3.4.0)

devtools (≥ 2.3.1)

fANCOVA (≥ 0.5.1)

qvalue ( ≥ 2.15.0)

Data Pre-processing

The data pre-processing has two steps. In the first step, we need to process the two ChIA-PET raw datasets using ChIA-PET Tool V3 (Li et al., 2019). In the second step, new anchors will be defined from the processed results; that is, the two processed data's anchors will be merged and considered the unique anchors. Using the newly defined anchors, we need to re-processed the raw data using an option --INPUT_ANCHOR_FILE in ChIA-PET Tool.

Example: for processing the GM12878 versus MCF7 datasets, we will use the following ChIA-PET Tool V3 command lines:

1. For the first-step analysis: we will use the following command lines for GM12878 and MCF7 datasets, respectively.

	java -jar ChIA-PET.jar --mode 1 --fastq1 GM12878_1.fastq --fastq2 GM12878_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 --output Output_GM12878 --prefix GM12878 
	java -jar ChIA-PET.jar --mode 1 --fastq1 MCF7_1.fastq --fastq2 MCF7_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 --output Output_MCF7 --prefix MCF7

2. For the second-step analysis: we will use the defined anchor from the first step, namely Anchor.bed.

	java -jar ChIA-PET.jar --mode 1 --fastq1 GM12878_1.fastq --fastq2 GM12878_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1  --INPUT_ANCHOR_FILE Anchor.bed --output Output_GM12878 --prefix GM12878_MCF7 
	java -jar ChIA-PET.jar --mode 1 --fastq1 MCF7_1.fastq --fastq2 MCF7_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 INPUT_ANCHOR_FILE Anchor.bed --output Output_MCF7 --prefix MCF7_GM12878

Remark: for detailed information on ChIA-PET Tool V3 data analysis, please visit the ChIA-PET Tool V3.

From the ChIA-PET Tool output files, the out.cluster.FDRfiltered.txt will be used for downstream analysis in NDID. Finally, we will find the overlap results between the two processed results using bedtools pairToPair. We considered the anchors' location, interaction frequency, and self-ligation PETs in each anchor (it used to measure the anchor enrichment) from the overlapped results. When we overlapped the two processed dataset results, we have a chance to find unique interactions only in one dataset. Therefore, we substituted a small interaction frequency (IF=1) for the corresponding dataset. The anchor enrichment is computed from the out.spet file in the ChIA-PET Tool V3 output. It computed for anchor1 (chrom1, start1, end1) and anchor2 (chrom2, start2, end2) separately and took the average values using the following commands.

awk '{if($2<$5){print $1"\t"$2"\t"$5}else{print $1"\t"$5"\t"$2}}' out.spet > out.spet.bed3

bedtools coverage -a Anchor1.bed -b out.spet.bed3|cut -f4 > self1.bed

bedtools coverage -a Anchor2.bed -b out.spet.bed3|cut -f4 > self2.bed

then compute the average of it and call the variable name “selfAvg”

Input files

The input data should have such kind of variables arrangement. The suffix 1 and 2 indicate the values belogs to sample-1 and sample-2 respectively.

chrom1	start1	end1	chrom2	start2	end2	ipet_1	selfAvg_1	ipet_2	selfAvg_2
chr1	27882430	27885761	chr1	27925635	27937736	46	704	20	135.5

Test data sets

RAD21 ChIA-PET data from human MCF7:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127022

RAD21 ChIA-PET data from human K562:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127027

RAD21 ChIA-PET data from human H1:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127037

RAD21 ChIA-PET data from human H9:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127034

RAD21 ChIA-PET data from human GM12878:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127053

Usage

library(devtools)
devtools::install_github("Yab29/NDID")
library(fANCOVA)
library(qvalue)
library(NDID)

setwd()   # Set your working directory 
NDID("input_file","output_prefix")

Example

Let us run the NDID on a given dataset, GM12878_MCF7.txt

NDID("GM12878_MCF7.txt","test")

Result file

We will get the result file named test.txt.

chrom1	start1	end1	chrom2	start2	end2	Normalized_ipet_1	Normalized_ipet_2	P-value	p.adjust	intensity type
chr1	39648742	39652714	chr1	39654893	39662163	3.83360974	15.20362117	0.000798074	0.022997704	1

Meaning of the columns:

chrom1: The name of the chromosome on which the cluster anchor 1 exists
start1: The start coordinates of cluster anchor 1
end1: The end coordinate of cluster anchor 1
Chrom2: The name of the chromosome on which the cluster anchor 2 exists
Start2: The start coordinates of cluster anchor 2
End2: The end coordinate of cluster anchor 2
Normalized_ipet_1: normalized number of PETs in sample-1
Normalized_ipet_2: normalized number of PETs in sample-2
P-value: This value represents the statistical significance of the interaction
p.adjust: p-value adjustment with Benjamini-Hockberg method (1995)
intensity type: 0 and 1 represent decreasing and increasing interactions intensity, respectively

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NDID

Introduction

The NDID requires the following dependencies:

Data Pre-processing

Input files

Test data sets

Usage

Example

Result file

Meaning of the columns:

Remark: the test data is available here

CopyRight © 2021 Guoliang's Lab, All Rights Reserved

Files

README.md

Latest commit

History

README.md

File metadata and controls

NDID

Introduction

The NDID requires the following dependencies:

Data Pre-processing

Input files

Test data sets

Usage

Example

Result file

Meaning of the columns:

Remark: the test data is available here

CopyRight © 2021 Guoliang's Lab, All Rights Reserved