Skip to content

Latest commit

 

History

History
78 lines (65 loc) · 7.17 KB

README.md

File metadata and controls

78 lines (65 loc) · 7.17 KB

NDID

Introduction

Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) sequencing is a technology to study genome-wide long-range chromatin interactions bound by protein factors. NDID is a statistical technique for the joint normalization and differential chromatin interactions detection from ChIA-PET experiments.

The NDID requires the following dependencies:

  1.   R (≥3.4.0)
  2.   devtools (≥ 2.3.1)
  3.   fANCOVA (≥ 0.5.1)
  4.   qvalue ( ≥ 2.15.0)

Data Pre-processing

The data pre-processing has two steps. In the first step, we need to process the two ChIA-PET raw datasets using ChIA-PET Tool V3 (Li et al., 2019). In the second step, new anchors will be defined from the processed results; that is, the two processed data's anchors will be merged and considered the unique anchors. Using the newly defined anchors, we need to re-processed the raw data using an option --INPUT_ANCHOR_FILE in ChIA-PET Tool.

Example: for processing the GM12878 versus MCF7 datasets, we will use the following ChIA-PET Tool V3 command lines:

1.  For the first-step analysis: we will use the following command lines for GM12878 and MCF7 datasets, respectively.

	java -jar ChIA-PET.jar --mode 1 --fastq1 GM12878_1.fastq --fastq2 GM12878_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 --output Output_GM12878 --prefix GM12878 
	java -jar ChIA-PET.jar --mode 1 --fastq1 MCF7_1.fastq --fastq2 MCF7_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 --output Output_MCF7 --prefix MCF7 

2.  For the second-step analysis: we will use the defined anchor from the first step, namely Anchor.bed.

	java -jar ChIA-PET.jar --mode 1 --fastq1 GM12878_1.fastq --fastq2 GM12878_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1  --INPUT_ANCHOR_FILE Anchor.bed --output Output_GM12878 --prefix GM12878_MCF7 
	java -jar ChIA-PET.jar --mode 1 --fastq1 MCF7_1.fastq --fastq2 MCF7_2.fastq --linker ChIA-PET_Tool_V3/linker/linker_long.txt --minimum_linker_alignment_score 14 --GENOME_INDEX hg19.fa --GENOME_LENGTH 3E9 --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg19.chromSize.txt --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 INPUT_ANCHOR_FILE Anchor.bed --output Output_MCF7 --prefix MCF7_GM12878 

Remark: for detailed information on ChIA-PET Tool V3 data analysis, please visit the ChIA-PET Tool V3.

From the ChIA-PET Tool output files, the out.cluster.FDRfiltered.txt will be used for downstream analysis in NDID. Finally, we will find the overlap results between the two processed results using bedtools pairToPair. We considered the anchors' location, interaction frequency, and self-ligation PETs in each anchor (it used to measure the anchor enrichment) from the overlapped results. When we overlapped the two processed dataset results, we have a chance to find unique interactions only in one dataset. Therefore, we substituted a small interaction frequency (IF=1) for the corresponding dataset. The anchor enrichment is computed from the out.spet file in the ChIA-PET Tool V3 output. It computed for anchor1 (chrom1, start1, end1) and anchor2 (chrom2, start2, end2) separately and took the average values using the following commands.

  1.   awk '{if($2<$5){print $1"\t"$2"\t"$5}else{print $1"\t"$5"\t"$2}}' out.spet > out.spet.bed3
  2.   bedtools coverage -a Anchor1.bed -b out.spet.bed3|cut -f4 > self1.bed
  3.   bedtools coverage -a Anchor2.bed -b out.spet.bed3|cut -f4 > self2.bed
  4.   then compute the average of it and call the variable name “selfAvg”

Input files

The input data should have such kind of variables arrangement. The suffix 1 and 2 indicate the values belogs to sample-1 and sample-2 respectively.

chrom1 start1 end1 chrom2 start2 end2 ipet_1 selfAvg_1 ipet_2 selfAvg_2
chr1 27882430 27885761 chr1 27925635 27937736 46 704 20 135.5

Test data sets

  1.   RAD21 ChIA-PET data from human MCF7:
    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127022
  2.  RAD21 ChIA-PET data from human K562:
    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127027
  3.  RAD21 ChIA-PET data from human H1:
    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127037
  4.  RAD21 ChIA-PET data from human H9:
    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127034
  5.  RAD21 ChIA-PET data from human GM12878:
    https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE127053

Usage

library(devtools)
devtools::install_github("Yab29/NDID")
library(fANCOVA)
library(qvalue)
library(NDID)

setwd()   # Set your working directory 
NDID("input_file","output_prefix")

Example

Let us run the NDID on a given dataset, GM12878_MCF7.txt

  • NDID("GM12878_MCF7.txt","test")

Result file

We will get the result file named test.txt.

chrom1 start1 end1 chrom2 start2 end2 Normalized_ipet_1 Normalized_ipet_2 P-value p.adjust intensity type
chr1 39648742 39652714 chr1 39654893 39662163 3.83360974 15.20362117 0.000798074 0.022997704 1

Meaning of the columns:

  • chrom1: The name of the chromosome on which the cluster anchor 1 exists
  • start1: The start coordinates of cluster anchor 1
  • end1: The end coordinate of cluster anchor 1
  • Chrom2: The name of the chromosome on which the cluster anchor 2 exists
  • Start2: The start coordinates of cluster anchor 2
  • End2: The end coordinate of cluster anchor 2
  • Normalized_ipet_1: normalized number of PETs in sample-1
  • Normalized_ipet_2: normalized number of PETs in sample-2
  • P-value: This value represents the statistical significance of the interaction
  • p.adjust: p-value adjustment with Benjamini-Hockberg method (1995)
  • intensity type: 0 and 1 represent decreasing and increasing interactions intensity, respectively

 Remark: the test data is available here


 CopyRight © 2021 Guoliang's Lab, All Rights Reserved