Skip to content

Latest commit

 

History

History

APAlyzer

APAlyzer

APAlyzer utilizes the PAS (polyadenylation sites) collection in the PolyA_DB database to examine APA (alternative polyadenylation) events in all genomic regions, including 3′UTRs and introns.

Sources:

APAlyzer method workflow run instructions

Input file

Required files are to be specified in the input config/samples.csv. Each row in the sample sheet has two columns:

  • condition: name of the condition (e.g control)
  • sample: name of the sample (e.g. control_replicate1)
  • bam: relative path from APAlyzer working directory / absolute path to the BAM input file for the sample

It is important to name samples of the same condition with the exact condition name under the condition column since samples are grouped per condition to be processed by APAlyzer.

Setting parameters in the config file

Parameters used to run APAlyzer are specified in config/config.APAlyzer.yaml. In the config file, users are able to specify the output directory and output file name: out_dir, differential_output_file.

In addition, the relative path from the working directory to the input sample file from the previous step is to be specified with parameter sample_file.

Other parameters that are important to specify for each run are the path to GTF annotation file and GTF annotation file organism, genome version, and ensemble version details: gtf, gtf_organism, gtf_genome_version, gtf_ensemble_version.

Setting up the environment

To run the method workflow, we first need to activate apaeval conda environment following the instructions on APAeval README.

Running the workflow

Before running, you can perform a 'dry run' to check which steps will be run and where output files will be generated given the provided parameters and input sample file:

bash dryrun.sh

To run the workflow locally, you can use the provided wrapper script run_local.sh which executes with singularity.

bash run_local.sh

Note: The run_local.sh script is currently set up to run with the APAeval test data. If you have specified absolute paths in your sample sheet (e.g. config/samples.csv) or the config file (config/config.DaPars2.yaml), or have input data that is not in the current directory, you will need to modify Singularity bind arguments so the input files will be available to the container.

e.g. The path to the input GTF file is /share/annotation/annotation.gtf, and my current working directory is /home/sam/DaPars2_snakemake/. Modify the --singularity-args line in run_local.sh like below to ensure the file is available to the container:

--sigularity-args="--bind /share/" \

If you are satisfied with the bind arguments, you can run the workflow locally by doing bash run_local.sh

Output & post-processing

The output of APAlyzer qualifies for differential challenge. The file is postprocessed into a tsv file consisting of a column of gene ids and another column of pvalues located in out_dir that is specified in the config file config/config.APAlyzer.yaml.

Rulegraph

The rulegraph gives an overview of the steps of the workflow. To obtain it, adapt and run the rulegraph.sh script. The current rulegraph is:

rulegraph

Author contact

If you have any question or comment about APAlyzer, please contact Dr. Ruijia Wang ([email protected]).