Skip to content

Download Fastqs Program

sarpiens edited this page Mar 14, 2024 · 3 revisions

Description

The Download Fastqs program allows downloading the associated fastq files from an ENA study project based on the information of the provided Metadata Table using the parfive package (https://pypi.org/project/parfive/). This program corresponds to the Collection Programs group.

There are two execution modes available:

  • ENA Metadata Mode. This mode will process the information from the previously generated Metadata Table file for the indicated ENA Download Column (-c parameter) and download the corresponding fastq files. If download errors are detected, a report with the conflicting URLs and error codes will be generated.

  • LINKS Mode. This mode will download the provided URLs in the Links TXT file (in which each line corresponds to a unique URL). This mode is especially useful, in order to re-launch URLs from error reports that could not be downloaded in previous attempts or to download fastqs or other files from different sources. For more details see the PRJEB10949_error_urls_example.txt example file.

Input Elements:

Input Type Description
PROJECT_metadata.tsv or links.txt File Input File. For ENA Metadata Mode, it will be one of the Metadata Tables generated in the different steps of the workflow by Download Metadata ENA program (PROJECT_ENA_metadata.tsv), Merge Metadata program (PROJECT_merged_metadata.tsv) or Filter Metadata program (PROJECT_filtered_metadata.tsv). Whereas for LINKS Mode, it will be a TXT file with URLs to download.

Output Elements:

Output Type Description
run_accession.fastq.gz Files Various Fastq files from the study project of interest
errors_report.tsv File Errors Report. Only produced if download errors occurred.

The resulting fastq files are the ones that will be used in the next workflow step, namely the Check Fastqs program. Depending on whether or not your fastq files are ready to use, you will have finished your curation process or need to further treat the downloaded fastq files and associated metadata (using the Make Treatment Template, Treat Metadata, and Treat Fastqs programs), respectively. To get an idea of what the next step would be in your particular case, check the workflow's diagram.

Arguments

Usage:

download_fastqs  [-h] -i INPUT_FILE [-m {ENA,LINKS}] 
                 [-c {fastq_ftp,fastq_aspera,fastq_galaxy,submitted_ftp,submitted_aspera,submitted_galaxy}] 
                 [-n MAX_CONN] [-p] [-o OUTPUT_DIRECTORY] [-x] [-v]

Options:

Parameter Description
-h, --help Show help message and exit.
-i, --input_file Input File. Indicate the path to the Input File with the information to download Fastq files.
-m, --mode Execution Mode (Optional) [Default:ENA]. Options: 1) ENA Metadata Table File [Expected sep=TABS] or 2) Links TXT File for generic file download. Permitted options are {ENA,LINKS}.
-c, --ena_download_column ENA Download Column (Optional) [Default:fastq_ftp]. Indicate the ENA Metadata Table column with the links to download. This parameter will be skipped if LINKS mode is used. Permitted options are {fastq_ftp, fastq_aspera, fastq_galaxy, submitted_ftp, submitted_aspera, submitted_galaxy}.
-n, --max_conn Max Number of Files in Parallel (Optional:Parfive Parameter) [Default:5]. Indicate the max number of files to be downloaded simultaneously in parallel.
-p, --parfive_verbose Parfive Verbose (Optional:Parfive Parameter). If indicated it will enable parfive verbose.
-o, --output_directory Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated.
-x, --plain_text Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors.
-v, --version Show program's version number and exit.

Examples

Commands:

  • Download fastqs from Metadata Table with colored text stdout:
download_fastqs -i PRJEB10949_ENA_metadata.tsv
  • Download fastqs from Metadata Table with plain text stdout:
download_fastqs -i PRJEB10949_ENA_metadata.tsv --plain_text
  • Download fastqs from URLs TXT file:
download_fastqs -i PRJEB10949_error_urls_example.txt -m LINKS
  • Download fastqs from Metadata Table using "submitted_ftp" instead of the default "fastq_ftp" as ENA Download Column:
download_fastqs -i PRJEB10949_ENA_metadata.tsv -c submitted_ftp
  • Download 10 fastqs simultaneously instead of 5 from Metadata Table :
download_fastqs -i PRJEB10949_ENA_metadata.tsv -n 10
  • Download fastqs from Metadata Table activating Parfive verbose:
download_fastqs -i PRJEB10949_ENA_metadata.tsv --parfive_verbose
  • Download fastqs from Metadata Table in the specified directory (downloads):
download_fastqs -i PRJEB10949_ENA_metadata.tsv -o downloads

To see a full and detailed example of dataset curation, see the Tutorial Full Example page.