-
Notifications
You must be signed in to change notification settings - Fork 1
Merge Metadata Program
The Merge Metadata program provides a set of different options to merge your Main Metadata Table with any Extra Metadata Table as a way to combine metadata from different sources. In the typical ENA workflow, the Main Metadata Table usually corresponds to the ENA metadata previously obtained from the Download Metadata ENA program and the Extra Metadata Table to the publication's metadata. Nevertheless, the program is generalist and can be used to merge metadata from external projects. By default a left join will be carried out, taking the Main Metadata Table file as reference. After merging, it will also analyze the values of the two provided merge columns by performing an intersection to check for the presence of non-common unique values (except when using the cross merge mode). For a more detail explanation of the pandas merge modes see the official pandas documentation. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no extra metadata available.
Input Elements:
Input | Type | Description |
---|---|---|
MAIN_metadata.tsv |
File |
Main Metadata Table |
EXTRA_metadata.tsv |
File |
Extra Metadata Table |
Output Elements:
Output | Type | Description |
---|---|---|
MERGED_metadata.tsv |
File |
Merged Metadata Table |
Merge Columns Intersection Checks |
stdout |
Results of the Merge Columns Intersection Analysis (except when using the cross merge mode) |
The resulting MERGED_metadata.tsv
file is the one that will be used in the next workflow steps, namely the Check Metadata ENA program if you are using the typical ENA workflow. Nevertheless, depending on your particular case it could also be used in other workflow steps, including the Filter Metadata, Download Fastqs, Check Fastqs and Make Treatment Template programs. To get an idea of what the next step would be in your particular case, check the workflow's diagram.
Usage:
merge_metadata [-h] -m MAIN_METADATA_TABLE -e EXTRA_METADATA_TABLE [-mc MAIN_MERGE_COLUMN]
[-ec EXTRA_MERGE_COLUMN] [-p {left,right,outer,inner,cross}]
[-ms MAIN_MERGE_SUFFIX] [-es EXTRA_MERGE_SUFFIX][-o OUTPUT_DIRECTORY] [-x] [-v]
Options:
Parameter | Description |
---|---|
-h, --help |
Show help message and exit. |
-m, --main_metadata_table |
Main Metadata Table [Expected sep=TABS]. Indicate the path to the Main Metadata Table file. |
-e, --extra_metadata_table |
Extra Metadata Table [Expected sep=TABS]. Indicate the path to the Extra Metadata Table file. |
-mc, --main_merge_column |
Main Metadata Merge Column. Main Metadata Table column to be used for merging. This parameter will be skipped if pandas_merge_mode = cross. |
-ec, --extra_merge_column |
Extra Metadata Merge Column. Extra Metadata Table column to be used for merging. This parameter will be skipped if pandas_merge_mode = cross. |
-p, --pandas_merge_mode |
Pandas Merge Mode (Optional) [Default:left]. Indicate the pandas merge mode to be used for merging. Permitted options are {left,right,outer,inner,cross}. |
-ms, --main_merge_suffix |
Main Metadata Pandas Merge Suffix (Optional) [Default:"_x"]. Suffix to add to overlapping column names for the Main Metadata columns. |
-es, --extra_merge_suffix |
Extra Metadata Pandas Merge Suffix (Optional) [Default:"_y"]. Suffix to add to overlapping column names for the Extra Metadata columns. |
-o, --output_directory |
Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated. |
-x, --plain_text |
Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors. |
-v, --version |
Show program's version number and exit. |
Commands:
- Merge metadata with colored text stdout:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions
- Merge metadata with plain text stdout:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions --plain_text
- Merge metadata and save results in the specified directory (Example):
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -o Example
- Merge metadata with different suffixes:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -ms _ENA -e PRJEB10949_publication_example.tsv -ec run_accessions -es _publication
- Merge metadata in right mode:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -p right
- Merge metadata in outer mode:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -p outer
- Merge metadata in inner mode:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -p inner
- Merge metadata in cross mode:
merge_metadata -m PRJEB10949_ENA_metadata.tsv -e PRJEB10949_publication_example.tsv -p cross
To see a full and detailed example of dataset curation, see the Tutorial Full Example page. If you only want to run this particular program, you can use the following PRJEB10949_publication_example.tsv
test file example and the PRJEB10949_ENA_metadata.tsv
file generated by the Download Metadata ENA program.