RNAcentral data import pipeline

About

This is the main pipeline that is used internally for loading the data into the RNAcentral database. More information. The pipeline is nextflow based and the main entry point is main.nf.

The pipeline is typically run as:

nextflow run -profile env -with-singularity pipeline.sif main.nf

The pipeline is meant to run

Configuring the pipeline

The pipeline requires a local.config file to exist and contain some information. Notably a PGDATABASE environment variable must be defined so data can be imported or fetched. In addition, to import specific databases there must be a params.import_data.databases dict defined. The keys must be known databases names and the values should be truthy to indicate the databases should be imported.

There is some more advanced configuration options available, such as turning on or off specific parts of the pipeline like genome mapping, qa, etc.

Using with Docker

The pipeline is meant to run in docker or singularity. You should build or fetch a suitable container. Some example commands are below.

build container

docker build -t rnacentral-import-pipeline .

open interactive shell inside a running container

docker run -v `pwd`:/rnacentral/rnacentral-import-pipeline -v /path/to/data:/rnacentral/data/ -it rnacentral-import-pipeline bash

Testing

Several tests require fetching some data files prior to testing. The files can be fetched with:

./scripts/fetch-test-data.sh

The tests can then be run using py.test. For example, running Ensembl importing tests can be done with:

py.test tests/databases/ensembl/

Other environment variables

The pipeline requires the NXF_OPTS environment variable to be set to -Dnxf.pool.type=sync -Dnxf.pool.maxThreads=10000, a module for doing this is in modules/cluster. Also some configuration settings for efficient usage on EBI's LSF cluster are in config/cluster.config.

License

See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 4,982 Commits
.github/workflows		.github/workflows
bin		bin
config		config
containers		containers
data		data
docs		docs
files		files
jenkins		jenkins
lib		lib
log		log
modules		modules
rnacentral_pipeline		rnacentral_pipeline
scripts		scripts
tests		tests
utils		utils
weekly-update		weekly-update
workflows		workflows
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.rustfmt.toml		.rustfmt.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
RELEASE.rst		RELEASE.rst
analyze.nf		analyze.nf
export.nf		export.nf
genes.nf		genes.nf
import-crs.nf		import-crs.nf
import-data.nf		import-data.nf
litscan.nf		litscan.nf
main.nf		main.nf
mypy.ini		mypy.ini
nextflow.config		nextflow.config
poetry.lock		poetry.lock
precompute.nf		precompute.nf
prepare-environment.nf		prepare-environment.nf
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
r2dt-scan.nf		r2dt-scan.nf
readme.md		readme.md
references-metadata-rnacentral.nf		references-metadata-rnacentral.nf
report.nf		report.nf
select_databases.nf		select_databases.nf
setup-env		setup-env
update-public.nf		update-public.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNAcentral data import pipeline

About

Configuring the pipeline

Using with Docker

Testing

Other environment variables

License

About

Releases

Packages

Contributors 6

Languages

License

RNAcentral/rnacentral-import-pipeline

Folders and files

Latest commit

History

Repository files navigation

RNAcentral data import pipeline

About

Configuring the pipeline

Using with Docker

Testing

Other environment variables

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages