-
Notifications
You must be signed in to change notification settings - Fork 1
Download Fastqs Program
The Download Fastqs program allows downloading the associated fastq files from an ENA study project based on the information of the provided Metadata Table using the parfive package (https://pypi.org/project/parfive/). This program corresponds to the Collection Programs group.
There are two execution modes available:
-
ENA Metadata Mode. This mode will process the information from the previously generated Metadata Table file for the indicated ENA Download Column (-c parameter) and download the corresponding fastq files. If download errors are detected, a report with the conflicting URLs and error codes will be generated.
-
LINKS Mode. This mode will download the provided URLs in the Links TXT file (in which each line corresponds to a unique URL). This mode is especially useful, in order to re-launch URLs from error reports that could not be downloaded in previous attempts or to download fastqs or other files from different sources. For more details see the
PRJEB10949_error_urls_example.txt
example file.
Input Elements:
Input | Type | Description |
---|---|---|
PROJECT_metadata.tsv or links.txt
|
File |
Input File. For ENA Metadata Mode, it will be one of the Metadata Tables generated in the different steps of the workflow by Download Metadata ENA program (PROJECT_ENA_metadata.tsv ), Merge Metadata program (PROJECT_merged_metadata.tsv ) or Filter Metadata program (PROJECT_filtered_metadata.tsv ). Whereas for LINKS Mode, it will be a TXT file with URLs to download. |
Output Elements:
Output | Type | Description |
---|---|---|
run_accession.fastq.gz |
Files |
Various Fastq files from the study project of interest |
errors_report.tsv |
File |
Errors Report. Only produced if download errors occurred. |
The resulting fastq files are the ones that will be used in the next workflow step, namely the Check Fastqs program. Depending on whether or not your fastq files are ready to use, you will have finished your curation process or need to further treat the downloaded fastq files and associated metadata (using the Make Treatment Template, Treat Metadata, and Treat Fastqs programs), respectively. To get an idea of what the next step would be in your particular case, check the workflow's diagram.
Usage:
download_fastqs [-h] -i INPUT_FILE [-m {ENA,LINKS}]
[-c {fastq_ftp,fastq_aspera,fastq_galaxy,submitted_ftp,submitted_aspera,submitted_galaxy}]
[-n MAX_CONN] [-p] [-o OUTPUT_DIRECTORY] [-x] [-v]
Options:
Parameter | Description |
---|---|
-h, --help |
Show help message and exit. |
-i, --input_file |
Input File. Indicate the path to the Input File with the information to download Fastq files. |
-m, --mode |
Execution Mode (Optional) [Default:ENA]. Options: 1) ENA Metadata Table File [Expected sep=TABS] or 2) Links TXT File for generic file download. Permitted options are {ENA,LINKS}. |
-c, --ena_download_column |
ENA Download Column (Optional) [Default:fastq_ftp]. Indicate the ENA Metadata Table column with the links to download. This parameter will be skipped if LINKS mode is used. Permitted options are {fastq_ftp, fastq_aspera, fastq_galaxy, submitted_ftp, submitted_aspera, submitted_galaxy}. |
-n, --max_conn |
Max Number of Files in Parallel (Optional:Parfive Parameter) [Default:5]. Indicate the max number of files to be downloaded simultaneously in parallel. |
-p, --parfive_verbose |
Parfive Verbose (Optional:Parfive Parameter). If indicated it will enable parfive verbose. |
-o, --output_directory |
Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated. |
-x, --plain_text |
Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors. |
-v, --version |
Show program's version number and exit. |
Commands:
- Download fastqs from Metadata Table with colored text stdout:
download_fastqs -i PRJEB10949_ENA_metadata.tsv
- Download fastqs from Metadata Table with plain text stdout:
download_fastqs -i PRJEB10949_ENA_metadata.tsv --plain_text
- Download fastqs from URLs TXT file:
download_fastqs -i PRJEB10949_error_urls_example.txt -m LINKS
- Download fastqs from Metadata Table using "submitted_ftp" instead of the default "fastq_ftp" as ENA Download Column:
download_fastqs -i PRJEB10949_ENA_metadata.tsv -c submitted_ftp
- Download 10 fastqs simultaneously instead of 5 from Metadata Table :
download_fastqs -i PRJEB10949_ENA_metadata.tsv -n 10
- Download fastqs from Metadata Table activating Parfive verbose:
download_fastqs -i PRJEB10949_ENA_metadata.tsv --parfive_verbose
- Download fastqs from Metadata Table in the specified directory (downloads):
download_fastqs -i PRJEB10949_ENA_metadata.tsv -o downloads
To see a full and detailed example of dataset curation, see the Tutorial Full Example page.