-
Notifications
You must be signed in to change notification settings - Fork 1
Concat Datasets Program
The Concat Datasets program allows to combine the curated final metadata tables of the different datasets of interest. The information of the provided Variables Dictionary file will be used to concatenate the rows of the different metadata tables found. Especially useful if we are interested in conducting a meta-study with multiple datasets. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no need to concatenate multiple final metadata table files.
The program will search by default all the files with '_metadata_final.tsv' extension in the provided input directory. There are two execution modes available:
-
Simple. The program will search the final metadata files directly at the first level of the provided input directory.
-
Project. The program will expect the input directory to present a folder for each dataset with its owns files inside (child directories of the provided input directory). If more than one final metadata table is detected, a menu will be displayed showing the found files asking if all found files should be included.
The Variables Dictionary file must contain at least two columns of interest for this particular program:
-
Variable. Indicates the names of the final columns of the metadata tables that will be used as our reference universe of possible variables. This column must be indicated as “variable” in the table header.
-
Requiredness. Indicates the character of the provided variables. This column must be indicated as “requiredness” in the table header. Valid options are “required” or “optional”.
For instance, see the variables_dictionary_example.tsv
test file, which contains the following information of interest:
variable | requiredness |
---|---|
? | ? |
Input Elements:
Input | Type | Description |
---|---|---|
/directory/path/input |
Directory |
Input Directory with the Datasets Metadata files |
variables_dictionary_file.tsv |
File |
Variables Dictionary |
Output Elements:
Output | Type | Description |
---|---|---|
concatenated_final_metadata.tsv |
File |
Concatenated Final Metadata Table |
The resulting concatenated_final_metadata.tsv
file is the final concatenated information of all the datasets for your meta-analysis.
Usage:
concat_datasets [-h] -i INPUT_DIRECTORY -d VARIABLES_DICTIONARY [-s {simple,project}]
[-p METADATA_PATTERN] [-op OUTPUT_NAME_PREFIX] [-o OUTPUT_DIRECTORY] [-x] [-v]
Options:
Parameter | Description |
---|---|
-h, --help |
Show help message and exit. |
-i, --input_directory |
Input Directory. Indicate the path to the Input Directory with the Datasets Metadata files. |
-d, --variables_dictionary |
Variables Dictionary [Expected sep=TABS]. Indicate path to the Variables Dictionary file. |
-s, --search_mode |
Search Mode (Optional) [Default:simple]. Indicate the selected mode to search metadata files for each dataset. Options: 1) Simple (all files are in the provided Input Directory) or 2) Project (the provided Input Directory has one folder for each dataset with their own files). Permitted options are {simple,project}. |
-p, --metadata_pattern |
Dataset Metadata File Pattern (Optional) [Default:"_metadata_final.tsv"]. Indicate the pattern to identify Dataset Metadata files. |
-op, --output_name_prefix |
Output Name Prefix (Optional). Indicate prefix name for the output files. |
-o, --output_directory |
Output Directory (Optional). Indicate the path to the Output Directory. Output files will be created in the current directory if not indicated. |
-x, --plain_text |
Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors. |
-v, --version |
Show program's version number and exit. |
Commands:
- Concatenate final metadata files with colored text stdout:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv
- Concatenate final metadata files with plain text stdout:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv --plain_text
- Concatenate final metadata files using project search mode:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv -s project
- Concatenate final metadata files and save results in the specified directory (Example):
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv -o Example
- Concatenate final metadata files adding a prefix name (meta-example) to the output file:
concat_datasets -i datasets_directory -d variables_dictionary_example.tsv -op meta-example
To see a full and detailed example of dataset curation, see the Tutorial Full Example page.