The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Updated pipeline template to nf-core/tools 2.2
[1.5] - 2021-12-01
- Finish porting the pipeline to the updated Nextflow DSL2 syntax adopted on nf-core/modules
- Bump minimum Nextflow version from
21.04.0
->21.10.3
- Removed
--publish_dir_mode
as it is no longer required for the new syntax
- Bump minimum Nextflow version from
[1.4] - 2021-11-09
- Convert pipeline to updated Nextflow DSL2 syntax for future adoption across nf-core
- Added a workflow to download FastQ files and to create samplesheets for ids from the Synapse platform hosted by Sage Bionetworks.
- SRA identifiers not available for direct download via the ENA FTP will now be downloaded via
sra-tools
. - Added
--force_sratools_download
parameter to preferentially download all FastQ files viasra-tools
instead of ENA FTP. - Correctly handle errors from SRA identifiers that do not return metadata, for example, due to being private.
- Retry an error in prefetch via bash script in order to allow it to resume interrupted downloads.
- Name output FastQ files by
{EXP_ACC}_{RUN_ACC}*fastq.gz
instead of{EXP_ACC}_{T*}*fastq.gz
for run id provenance - [#46] - Bug in sra_ids_to_runinfo.py
- Added support for DDBJ ids. See examples below:
DDBJ |
---|
PRJDB4176 |
SAMD00114846 |
DRA008156 |
DRP004793 |
DRR171822 |
DRS090921 |
DRX162434 |
[1.3] - 2021-09-15
- Replaced Python
requests
withurllib
to fetch ENA metadata
Dependency | Old version | New version |
---|---|---|
python |
3.8.3 | 3.9.5 |
[1.2] - 2021-07-28
- Updated pipeline template to nf-core/tools 2.1
- [#26] - Update broken EBI API URL
[1.1] - 2021-06-22
- [#12] - Error when using singularity - /etc/resolv.conf doesn't exist in container
- Added
--sample_mapping_fields
parameter to create a separateid_mappings.csv
andmultiqc_config.yml
with selected fields that can be used to rename samples in general and in MultiQC
[1.0] - 2021-06-08
Initial release of nf-core/fetchngs, created with the nf-core template.
Via a single file of ids, provided one-per-line the pipeline performs the following steps:
- Resolve database ids back to appropriate experiment-level ids and to be compatible with the ENA API
- Fetch extensive id metadata including direct download links to FastQ files via ENA API
- Download FastQ files in parallel via
curl
and performmd5sum
check - Collate id metadata and paths to FastQ files in a single samplesheet
Currently, the following types of example identifiers are supported:
SRA |
ENA |
GEO |
---|---|---|
SRR11605097 | ERR4007730 | GSM4432381 |
SRX8171613 | ERX4009132 | GSE147507 |
SRS6531847 | ERS4399630 | |
SAMN14689442 | SAMEA6638373 | |
SRP256957 | ERP120836 | |
SRA1068758 | ERA2420837 | |
PRJNA625551 | PRJEB37513 |