Skip to content

Concat Datasets Program

sarpiens edited this page Mar 13, 2024 · 6 revisions

Description

The Concat Datasets program allows to combine the curated final metadata tables of the different datasets of interest. The information of the provided Variables Dictionary file will be used to concatenate the rows of the different metadata tables found. Especially useful if we are interested in conducting a meta-study with multiple datasets. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no need to concatenate multiple final metadata table files.

The program will search by default all the files with '_metadata_final.tsv' extension in the provided input directory. There are two execution modes available:

  • Simple. The program will search the final metadata files directly at the first level of the provided input directory.

  • Project. The program will expect the input directory to present a folder for each dataset with its owns files inside (child directories of the provided input directory). If more than one final metadata table is detected, a menu will be displayed showing the found files asking if all found files should be included.

The Variables Dictionary file must contain at least two columns of interest for this particular program:

  • Variable. Indicates the names of the final columns of the curated metadata table that will be used as our reference universe of possible variables. This column must be indicated as "variable" in the table header. The program will verify that all variables (column names) in the curated metadata tables are present in the Variables Dictionary.

  • Requiredness. Indicates the requiredness nature of the provided variables. This column must be indicated as "requiredness" in the table header. Valid options are "required" (a variable that must always be present and cannot contain NAs) or "optional" (a variable that could be not present). The program will verify that all required variables are present in the curated metadata tables before performing any concatenation.

For instance, see the variables_dictionary_example.tsv test file.

Input Elements:

Input Type Description
/directory/path/input Directory Input Directory with the Datasets Metadata files
variables_dictionary_file.tsv File Variables Dictionary

Output Elements:

Output Type Description
concatenated_final_metadata.tsv File Concatenated Final Metadata Table

The resulting concatenated_final_metadata.tsv file is the final concatenated information of all the datasets for your meta-analysis.

Arguments

Usage:

concat_datasets [-h] -i INPUT_DIRECTORY -d VARIABLES_DICTIONARY [-s {simple,project}] 
                [-p METADATA_PATTERN] [-op OUTPUT_NAME_PREFIX] [-o OUTPUT_DIRECTORY] [-x] [-v]

Options:

Parameter Description
-h, --help Show help message and exit.
-i, --input_directory Input Directory. Indicate the path to the Input Directory with the Datasets Metadata files.
-d, --variables_dictionary Variables Dictionary [Expected sep=TABS]. Indicate path to the Variables Dictionary file.
-s, --search_mode Search Mode (Optional) [Default:simple]. Indicate the selected mode to search metadata files for each dataset. Options: 1) Simple (all files are in the provided Input Directory) or 2) Project (the provided Input Directory has one folder for each dataset with their own files). Permitted options are {simple,project}.
-p, --metadata_pattern Dataset Metadata File Pattern (Optional) [Default:"_metadata_final.tsv"]. Indicate the pattern to identify Dataset Metadata files.
-op, --output_name_prefix Output Name Prefix (Optional). Indicate prefix name for the output files.
-o, --output_directory Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated.
-x, --plain_text Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors.
-v, --version Show program's version number and exit.

Examples

Commands:

  • Concatenate final metadata files with colored text stdout:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv
  • Concatenate final metadata files with plain text stdout:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv --plain_text
  • Concatenate final metadata files using project search mode:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv -s project 
  • Concatenate final metadata files and save results in the specified directory (Example):
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv -o Example
  • Concatenate final metadata files adding a prefix name (meta-example) to the output file:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv -op meta-example 

To see a full and detailed example of dataset curation, see the Tutorial Full Example page.