DIMPL (Discovery of Intergenic Motifs PipeLine)

==============================

Summary

The DIMPL discovery pipeline enables rapid extraction and selection of bacterial IGRs that are enriched for structured ncRNAs. DIMPL also automates the subsequent computational steps necessary for their functional identification.

Requirements

For Local Computer

Docker Desktop/Engine

For Compute Cluster

Quick-start

Cluster Configuration

Download the IGR search database (filename: s50.igr.fasta) from this link to your cluster using Globus FTP.
Ensure the availability of the BLAST nr database on your computational cluster. Follow these instructions for updating/downloading the latest version.

Local Configuration

Download the source code (into any folder on drive)

wget https://github.com/BreakerLab/dimpl/archive/dimpl_1.0.2.tar.gz
tar xzvf dimpl_1.0.2.tar.gz
Download the docker image.

docker pull breakerlab/dimpl
Configure docker to grant containers access to the folder where the DIMPL repository is located
Modify the configuration file found at dimpl/src/shell/cluster.conf with the database locations and appropriate commands for importing utilities on your cluster.
Run ./start.sh in the main repository directory. Follow the first-time configuration instructions (asks for email and NCBI API key).
Follow the link generated by the start.sh script to access the DIMPL jupyter notebooks.

Data Transfer between Local Machine and Cluster

The DIMPL notebooks generate compressed .tar.gz files consisting of all the scripts and data necessary to run the more computationally demanding steps on a cluster. These .tar.gz files are placed in the directory data/export. After transferring the files to a cluster they should be unpacked using the command tar xzvf data-dir.tar.gz. When tasks on the cluster complete the directory should be recompressed using the command tar czvf data-dir.tar.gz data-dir.

File Organization

├── .env                    <- File generated during configuration step of start.sh
├── LICENSE
├── README.md               <- This document
├── start.sh                <- Script to perform initial configuration and start the docker container
├── data
│   ├── export              <- Where DIMPL places data and bash script tar.gz files  
│   ├── import              <- Where to place re-compressed tar.gz files that have been run on a compute cluster
│   ├── interim             <- Where processed genomic data is stored during analysis
│   └── raw                 <- The original genomic data.
│
├── docs                    <- Sphinx documentation for DIMPL
│
├── notebooks               <- Jupyter notebooks for the various steps of DIMPL
│   ├── 1-Genome-IGR-Selection.ipynb    <- 
│   ├── 2-BLAST-Processing.ipynb        <- 
│   ├── 3-IGR-Report.ipynb              <- 
│   └── 4-Motif-Refinement.ipynb        <- 
│
├── requirements.txt        <- The requirements file for reproducing the analysis environment, e.g.
│                              generated with `pip freeze > requirements.txt`
│
├── setup.py                <- makes project pip installable (pip install -e .) so src can be imported
└── src                     <- Source code for use in this project.
    └── shell               <- The original genomic data.
        └── cluster.conf    <- Configuration file for the compute environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIMPL (Discovery of Intergenic Motifs PipeLine)

Summary

Requirements

For Local Computer

For Compute Cluster

Quick-start

Cluster Configuration

Local Configuration

Data Transfer between Local Machine and Cluster

File Organization

About

Releases 3

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
data		data
docker		docker
docs		docs
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
start.sh		start.sh

License

BreakerLab/dimpl

Folders and files

Latest commit

History

Repository files navigation

DIMPL (Discovery of Intergenic Motifs PipeLine)

Summary

Requirements

For Local Computer

For Compute Cluster

Quick-start

Cluster Configuration

Local Configuration

Data Transfer between Local Machine and Cluster

File Organization

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 4

Languages

Packages