Merge Metadata Program

Description

The Merge Metadata program provides a set of different options to merge your Main Metadata Table with any Extra Metadata Table as a way to combine metadata from different sources. In the typical ENA workflow, the Main Metadata Table usually corresponds to the ENA metadata previously obtained from the Download Metadata ENA program and the Extra Metadata Table to the publication's metadata. Nevertheless, the program is generalist and can be used to merge metadata from external projects. By default a left join will be carried out, taking the Main Metadata Table file as reference. After merging, it will also analyze the values of the two provided merge columns by performing an intersection to check for the presence of non-common unique values (except when using the cross merge mode). For a more detail explanation of the pandas merge modes see the official pandas documentation. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no extra metadata available.

Input Elements:

Input	Type	Description
`MAIN_metadata.tsv`	`File`	Main Metadata Table
`EXTRA_metadata.tsv`	`File`	Extra Metadata Table

Output Elements:

Output	Type	Description
`MERGED_metadata.tsv`	`File`	Merged Metadata Table
`Merge Columns Intersection Checks`	`stdout`	Results of the Merge Columns Intersection Analysis (except when using the cross merge mode)

The resulting MERGED_metadata.tsv file is the one that will be used in the next workflow steps, namely the Check Metadata ENA program if you are using the typical ENA workflow. Nevertheless, depending on your particular case it could also be used in other workflow steps, including the Filter Metadata, Download Fastqs, Check Fastqs and Make Treatment Template programs. To get an idea of what the next step would be in your particular case, check the workflow's diagram.

Arguments

Usage:

merge_metadata [-h] -m MAIN_METADATA_TABLE -e EXTRA_METADATA_TABLE [-mc MAIN_MERGE_COLUMN] 
               [-ec EXTRA_MERGE_COLUMN] [-p {left,right,outer,inner,cross}] 
               [-ms MAIN_MERGE_SUFFIX] [-es EXTRA_MERGE_SUFFIX][-o OUTPUT_DIRECTORY] [-x] [-v]

Options:

Parameter	Description
`-h, --help`	Show help message and exit.
`-m, --main_metadata_table`	Main Metadata Table [Expected sep=TABS]. Indicate the path to the Main Metadata Table file.
`-e, --extra_metadata_table`	Extra Metadata Table [Expected sep=TABS]. Indicate the path to the Extra Metadata Table file.
`-mc, --main_merge_column`	Main Metadata Merge Column. Main Metadata Table column to be used for merging. This parameter will be skipped if pandas_merge_mode = cross.
`-ec, --extra_merge_column`	Extra Metadata Merge Column. Extra Metadata Table column to be used for merging. This parameter will be skipped if pandas_merge_mode = cross.
`-p, --pandas_merge_mode`	Pandas Merge Mode (Optional) [Default:left]. Indicate the pandas merge mode to be used for merging. Permitted options are {left,right,outer,inner,cross}.
`-ms, --main_merge_suffix`	Main Metadata Pandas Merge Suffix (Optional) [Default:"_x"]. Suffix to add to overlapping column names for the Main Metadata columns.
`-es, --extra_merge_suffix`	Extra Metadata Pandas Merge Suffix (Optional) [Default:"_y"]. Suffix to add to overlapping column names for the Extra Metadata columns.
`-o, --output_directory`	Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated.
`-x, --plain_text`	Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors.
`-v, --version`	Show program's version number and exit.

Examples

Commands:

Merge metadata with colored text stdout:

merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions

Merge metadata with plain text stdout:

merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions --plain_text

Merge metadata and save results in the specified directory (Example):

merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -o Example

Merge metadata with different suffixes:

merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -ms _ENA -e PRJEB10949_publication_example.tsv -ec run_accessions -es _publication

Merge metadata in right mode:

merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -p right

Merge metadata in outer mode:

merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -p outer

Merge metadata in inner mode:

merge_metadata -m PRJEB10949_ENA_metadata.tsv -mc run_accession -e PRJEB10949_publication_example.tsv -ec run_accessions -p inner

Merge metadata in cross mode:

merge_metadata -m PRJEB10949_ENA_metadata.tsv -e PRJEB10949_publication_example.tsv -p cross

To see a full and detailed example of dataset curation, see the Tutorial Full Example page. If you only want to run this particular program, you can use the following PRJEB10949_publication_example.tsv test file example and the PRJEB10949_ENA_metadata.tsv file generated by the Download Metadata ENA program.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge Metadata Program

Description

Arguments

Examples

Clone this wiki locally