-
Notifications
You must be signed in to change notification settings - Fork 1
Treat Fastqs Program
The Treat Fastqs program allows different treatment operations to be performed on the downloaded fastq files based on the treatment information provided by a Treatment Template file. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no need for further treatment of your fastq files.
The provided fastq files can be treated in three different modes:
-
Merge Mode. This treatment will merge the samples with the same sample name, generating a new merged file in the Output Directory, if the combination of fastq types is a permitted configuration. It must be indicated with “merge” in the treatment column. Permitted configurations for this mode are:
- An equal number of PAIRED and SINGLE Fastq files with more than 1 file per Fastq type [Number of pair1(s) > 1; Number of pair2(s) > 1; Number of single(s) > 1; Number of pair1(s) = Number of pair2(s) = Number of single(s)].
- An equal number of PAIRED Fastq files with more than 1 file per fastq type [Number of pair1(s) > 1; Number of pair2(s) > 1; Number of single(s) = 0; Number of pair1(s) = Number of pair2(s)].
- More than one SINGLE Fastq file [Number of pair1(s) = 0; Number of pair2(s) = 0; Number of single(s) > 1].
-
Rename Mode. This treatment will change the file name for the indicated value in the sample name column generating a new file in the Output Directory, if the combination of fastq types is a permitted configuration. It must be indicated with “rename” in the treatment column. Permitted configurations for this mode are:
- A pair of PAIRED Fastq files with a unique SINGLE Fastq file [Number of pair1(s) = 1; Number of pair2(s) = 1; Number of single(s) = 1].
- A pair of PAIRED Fastq files [Number of pair1(s) = 1; Number of pair2(s) = 1; Number of single(s) = 0].
- A unique SINGLE Fastq file [Number of pair1(s) = 0; Number of pair2(s) = 0; Number of single(s) = 1].
-
Copy Mode. This treatment will copy the specified fastq file to the Output Directory, ignoring the value in the sample name column. It must be indicated with “copy” in the treatment column.
For instance, if we had the following PROJECT_treatment_template.tsv
:
sample_name | fastq_file_name | fastq_type | treatment |
---|---|---|---|
Sample0 | ERR12233.fastq.gz | single | copy |
Sample1 | ERR12234.fastq.gz | single | rename |
Sample2 | ERR12235.fastq.gz | single | merge |
Sample2 | ERR12236.fastq.gz | single | merge |
The program would perform the following treatment on the fastq files:
Input Elements:
Input | Type | Description |
---|---|---|
PROJECT_treatment_template.tsv |
File |
Final Curated Treatment Template |
/directory/path/input |
Directory |
Downloaded Fastqs Directory |
/directory/path/output |
Directory |
Treated Fastqs Directory |
Output Elements:
Output | Type | Description |
---|---|---|
sample.fastq.gz |
Files |
Various Treated Fastq Files |
The resulting files are the final treated fastq files. To get a general idea of the optional treatment steps of the workflow, check the workflow's diagram.
Usage:
treat_fastqs [-h] -t TREATMENT_TEMPLATE -i INPUT_DIRECTORY -o OUTPUT_DIRECTORY
[-p FASTQ_PATTERN] [-r1 R1_PATTERN] [-r2 R2_PATTERN] [-x] [-v]
Options:
Parameter | Description |
---|---|
-h, --help |
Show help message and exit. |
-t, --treatment_template |
Treatment Template [Expected sep=TABS]. Indicate the path to the Treatment Template file. |
-i, --input_directory |
Input Directory. Indicate the path to the Input Directory with the Fastq files to treat. |
-o, --output_directory |
Output Directory. Indicate the path to the Output Directory to save the resulting treated Fastq files. |
-p, --fastq_pattern |
Fastq File Pattern (Optional) [Default:".fastq.gz"]. Indicate the pattern to identify Fastq files. |
-r1, --r1_pattern |
R1 File Pattern (Optional) [Default:"_1.fastq.gz"]. Indicate the pattern to identify R1 PAIRED Fastq files. |
-r2, --r2_pattern |
R2 File Pattern (Optional) [Default:"_2.fastq.gz"]. Indicate the pattern to identify R2 PAIRED Fastq files. |
-x, --plain_text |
Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors. |
-v, --version |
Show program's version number and exit. |
Commands:
- Treat Fastqs with colored text stdout:
treat_fastqs -t treatment_template_filtered_PRJEB10949_merged_metadata_example.tsv -i downloads -o treated_files
- Treat Fastqs with plain text stdout:
treat_fastqs -t treatment_template_filtered_PRJEB10949_merged_metadata_example.tsv -i downloads -o treated_files --plain_text
- Treat Fastqs using "fq.gz" instead of the default "fastq.gz" Fastq Pattern:
treat_fastqs -t treatment_template_PROJECT_metadata_files_other_fastq_extension.tsv -i downloads -o treated_files -p ".fq.gz" -r1 "_1.fq.gz" -r2 "_2.fq.gz"
To see a full and detailed example of dataset curation, see the Tutorial Full Example page. Particularly recommended in this case.