Skip to content

Latest commit

 

History

History
372 lines (269 loc) · 20.9 KB

admin_tool.md

File metadata and controls

372 lines (269 loc) · 20.9 KB

Introduction

This page describes how to properly build and use the HIPC Dashboard admin tool.

Environment Variables

The following environment variables are referenced in this document and should be defined for the proper functioning of the admin tool:

  • HIPC_HOME: points to the directory in which the entire signature source code repository has been downloaded.
  • HIPC_DATA_HOME: points to the directory which contains signature data to be imported.

To make environment variables available to your shell, run the export command:

export HIPC_HOME=/path/to/hipc-signature
export HIPC_DATA_HOME=/path/to/hipc-signature-data

Admin Properties

In order for the admin tool to properly load data into the HIPC Signature database, it needs to know the location of the data. This is the function of $HIPC_HOME/admin/src/main/resources/META-INF-spring/admin.properties file. This file needs to exist and have proper values before compiling the admin tool. In the source distribution you will find $HIPC_HOME/admin/src/main/resources/META-INF-spring/admin.properties.example to use as a basis for your admin.properties file. More information about these properties can be found in the Subject Data and Submission Data sections of this document.

Taxonomy List

All desired organisms should be listed in $HIPC_HOME/admin/src/main/resources/simple-taxonomy-list.txt. This file follows a simple name, taxonomy_id format:

name	taxonomy_id
Homo sapiens	 9606
Mus musculus	 10090

Subject Data

Subjects in the HIPC Signature are those entities that play various roles in experiments conducted by HIPC network centers which result in submission data that you will find in the HIPC Signature. Subject data includes gene, protein, and compound data. Subject data needs to be imported into the Dashboard database before HIPC network center data can be imported. With the exception of gene and protein data, all the required subject data can be found in the a ZIP file distribution, which will be henceforth referred to as zipped dashboard data. This file should be downloaded and unzipped into $HIPC_DATA_HOME.

+The following subject data and sources are support for import by the admin tool:

  • Gene: Gene data as provided by Entrez. This data can be downloaded via ftp at the following URL: ftp://ftp.ncbi.nih.gov//gene/DATA/GENE_INFO/. The gene_data file should be downloaded into $HIPC_DATA_HOME/subject_data/gene. If this file is placed in any other directory, the following entry in admin.properties needs to be update:
gene.data.location=file:${HIPC_DATA_HOME}/subject_data/gene/*.gene_info
  • Animal Model: Animal Model data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping zipped dashboard data, this data can be found in $HIPC_DATA_HOME/subject_data/animal_model. The following entries in admin.properties specify the location of animal model data:
animal.model.location=file:${HIPC_DATA_HOME}/subject_data/animal_model/animal_model.txt
  • Cell Line: Cell Line data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping zipped dashboard data, this data can be found in $HIPC_DATA_HOME/subject_data/cell_sample. The following entries in admin.properties specify the location of cell line data:
cell.line.name.type.location=file:${HIPC_DATA_HOME}/subject_data/cell_sample/cell_name_type.txt
cell.line.annotation.type.location=file:${HIPC_DATA_HOME}/subject_data/cell_sample/cell_anno_type.txt
cell.line.annotation.name.location=file:${HIPC_DATA_HOME}/subject_data/cell_sample/cell_anno_name.txt
cell.line.annotation.source.location=file:${HIPC_DATA_HOME}/subject_data/cell_sample/cell_anno_source.txt
cell.line.annotation.sample.location=file:${HIPC_DATA_HOME}/subject_data/cell_sample/cell_anno.txt
cell.line.id.location=file:${HIPC_DATA_HOME}/subject_data/cell_sample/cell_sample.txt
cell.line.name.location=file:${HIPC_DATA_HOME}/subject_data/cell_sample/cell_sample_name.txt
  • Compounds: Compound data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping zipped dashboard data, this data can be found in $HIPC_DATA_HOME/subject_data/compound. The following entries in admin.properties specify the location of compound data:
compounds.location=file:${HIPC_DATA_HOME}/subject_data/compound/Compounds.txt
compound.synonyms.location=file:${HIPC_DATA_HOME}/subject_data/compound/CompoundSynonyms.txt
  • Proteins: Protein data as provided by UniProt. This data can be downloaded via ftp at the following URL: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/. The UniProt data files should be downloaded into $HIPC_DATA_HOME/subject_data/protein. If multiple UniProt files are downloaded, the following entry in admin.properties needs to be set:
protein.data.location=file:${HIPC_DATA_HOME}/protein/uniprot_sprot_*.dat

otherwise the specific file can be reference:

protein.data.location=file:${HIPC_DATA_HOME}/subject_data/protein/uniprot_sprot_human.dat
  • shRNA: shRNA data as provide by the RNAi Consortium at the Broad Institute. After downloading and unzipping zipped dashboard data, a subset of this data can be found in $HIPC_DATA_HOME/subject_data/shrna. The following entries in admin.properties specify the location of shRNA data:
trc.shrna.data.location=file:${HIPC_DATA_HOME}/subject_data/shrna/trc_public.05Apr11.txt
trc.shrna.filter.location=file:${HIPC_DATA_HOME}/subject_data/shrna/trc-shrnas-filter.txt
  • siRNA: After downloading and unzipping zipped dashboard data, a subset of this data can be found in $HIPC_DATA_HOME/subject_data/sirna. The following entries in admin.properties specify the location of siRNA data:
sirna.reagents.location=file:${HIPC_DATA_HOME}/subject_data/sirna/siRNA_reagents.txt
  • Tissue Sample: Tissue Sample data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping zipped dashboard data, this data can be found in $HIPC_DATA_HOME/subject_data/tissue_sample. The following entries in admin.properties specify the location of tissue-sample data:
tissue.sample.data.location=file:${HIPC_DATA_HOME}/subject_data/tissue_sample/tissue_sample_name.txt

Submission Data

As previously noted, the data that results from the experiments performed by HIPC network centers which makes its way into the HIPC Signature database is called submission data. After downloading and unzipping zipped dashboard data, submission data can be found in $HIPC_DATA_HOME/subject_data. For each center-submission pair is a property within admin.properties that specifies the location of the data:

broad.cmp.sens.lineage.enrich.data.location=file:${HIPC_DATA_HOME}/submissions/20130328-broad_cpd_sens_lineage_enrich-MST-312/20130328-broad_cpd_sens_lineage_enrich-MST-312.txt
broad.cmp.sens.mutation.enrich.data.location=file:${HIPC_DATA_HOME}/submissions/20130328-broad_cpd_sens_mutation_enrich-navitoclax/20130328-broad_cpd_sens_mutation_enrich-navitoclax.txt
broad.tier3.navitoclax.story.data.location=file:${HIPC_DATA_HOME}/submissions/20130402-broad_tier3_navitoclax_story/20130402-broad_tier3_navitoclax_story.txt
columbia.marina.analysis.data.location=file:${HIPC_DATA_HOME}/submissions/20130402-columbia_marina_analysis-T-ALL/20130402-columbia_marina_analysis-T-ALL.txt
columbia.joint.mr.shrna.diff.analysis.data.location=file:${HIPC_DATA_HOME}/submissions/20130401-columbia_joint_mr_shrna_diff-T-ALL/20130401-columbia_joint_mr_shrna_diff-T-ALL.txt
columbia.tier4.glioma.story.data.location=file:${HIPC_DATA_HOME}/submissions/20130401-columbia_tier4_glioma_story/20130401-columbia_tier4_glioma_story.txt
cshl.tier4.fgf19.story.data.location=file:${HIPC_DATA_HOME}/submissions/20130403-cshl_tier4_fgf19_story/20130403-cshl_tier4_fgf19_story.txt
dfci.tier4.beta.catenin.story.data.location=file:${HIPC_DATA_HOME}/submissions/20130401-dfci_tier4_beta-catenin_story/20130401-dfci_tier4_beta-catenin_story.txt
dfci.reporter.analysis.data.location=file:${HIPC_DATA_HOME}/submissions/20130426-dfci_reporter_analysis-bcat/20130426-dfci_reporter_analysis-bcat.txt
dfci.ataris.analysis.data.location=file:${HIPC_DATA_HOME}/submissions/20130429-dfci_ataris_analysis/20130429-dfci_ataris_analysis.txt
dfci.ovarian.analysis.data.location=file:${HIPC_DATA_HOME}/submissions/20130426-dfci_ovarian_analysis/20130426-dfci_ovarian_analysis.txt
dfci.pax8.tier3.data.location=file:${HIPC_DATA_HOME}/submissions/20130429-dfci_pax8_tier3/20130429-dfci_pax8_tier3.txt
emory.ppi-raf1.data.location=file:${HIPC_DATA_HOME}/submissions/20131220-emory_PPI_analysis-RAF1/20131220-emory_PPI_analysis-RAF1.txt
fhcrc.tier1.cst.profiling.data.location=file:${HIPC_DATA_HOME}/submissions/20131117-fhcrc-m_tier1_cst_profiling-SOC/20131117-fhcrc-m_tier1_cst_profiling-SOC.txt
utsw.tier2.discoipyrroles.story.data.location=file:${HIPC_DATA_HOME}/submissions/20130921-utsw_discoipyrrole_tier2_story/20130921-utsw_discoipyrrole_tier2_story.txt
utsw.tier4.discoipyrroles.story.data.location=file:${HIPC_DATA_HOME}/submissions/20130503-utsw_tier4_discoipyrroles_story/20130503-utsw_tier4_discoipyrroles_story.txt
ucsf.differential-expression.data.location=file:${HIPC_DATA_HOME}/submissions/20140124-ucsf_differential_expression/20140124-ucsf_differential_expression.txt

Submission Metadata

The dashboard imports metadata for each submission. After downloading and unzipping zipped dashboard data, this metadata can be found in the following two files:

  1. $HIPC_DATA_HOME/dashboard-CV-per-template.txt: Every dashboard submission is derived from an underlying template. dashboard-CV-per-template.txt is the file that contains metadata for all submission templates known to the dashboard. For each template, it contains the following information:

  2. $HIPC_DATA_HOME/dashboard-CV-per-column.txt: This file describes the experimental data and the relationships between the experimental data that each submission data template was designed to capture.

Spring Batch

The HIPC Signature pipeline has been developed using the Spring Batch framework. For each new submission the following Spring Batch configuration files need to be modified:

  1. $HIPC_HOME/admin/src/main/resources/META-INF/spring/observationDataApplicationContext.xml: This file configures a Spring Batch reader to read the new submission data. For example, here is a snippet from this file which defines the Emory University PPI analysis submission reader:
  <bean name="emoryPPIRAF1Reader" class="org.springframework.batch.item.file.MultiResourceItemReader">
	<property name="resources" value="${emory.ppi-raf1.data.location}" />
	<property name="delegate">
	  <bean class="org.springframework.batch.item.file.FlatFileItemReader">
		<property name="lineMapper" ref="emoryPPIRAF1LineMapper" />
		<property name="linesToSkip" value="7" />
	  </bean>
	</property>
  </bean>

The important thing to note is the resource location, emory.ppi-raf1.data.location. This should correspond to an entry in admin.properties as described above. In addition, we have a reference to emoryPPIRAF1LineMapper. This is a reference to the Spring bean which is responsible for the parsing of each line in the Emory University submission. This mapper is defined in the following section.

  1. $HIPC_HOME/admin/src/main/resources/META-INF/spring/observationDataSharedApplicationContext.xml: This file configures the overall Spring Batch job in addition to the individual submission line mappers and tokenizers. Another important reason for this file is to configure the mappings between the Spring Batch submission processors and the DashboardDao - data access class. Continuing with our Emory University, within the "observationDataImportJob" recipe, you will find the following snippet:
    <batch:step id="emoryPPIRAF1Step" parent="observationDataStep" next="mskccForetinibStep">
	  <batch:tasklet>
		<batch:chunk reader="emoryPPIRAF1Reader" processor="observationDataProcessor" writer="observationDataWriter"/>
	  </batch:tasklet>
    </batch:step>

Here we are defining the Emory submission processing step within the overall Dashboard submission processing Spring Batch job (observationDataImportJob). Following the job description are the definitions for each submission line mapper and tokenizer. The following snippet defines the Emory University submission line mapper and tokenizer:

  <bean name="emoryPPIRAF1LineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
	<property name="fieldSetMapper" ref="observationDataMapper" />
	<property name="lineTokenizer" ref="emoryPPIRAF1LineTokenizer" />
  </bean>

  <bean name="emoryPPIRAF1LineTokenizer" class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer" >
	<property name="delimiter" value="\u0009"/>
	<property name="names" value="dummy,submission_name,submission_date,template_name,cell_line,gene_symbol_1,gene_symbol_2,assay_type,number_of_measurements,average_fold_over_control_value,p_value,nci_portal"/>
  </bean>

The important thing to note here is the "names" property. Here is listing of all column headers found in the Emory University submission template. The first entry is always a dummy placeholder to take into account the metadata labels found in column one of each submission template.

Finally, the mapping between the Spring Batch submission processors and the DashboardDao is defined within the "observationTemplateMap" bean, which is defined after all the line mappers and tokenizers. Here is a snippet which defines these mappings for the Emory University submission:

		<entry key="emory_PPI_analysis:cell_line" value="subject:findSubjectsBySynonym" />
		<entry key="emory_PPI_analysis:gene_symbol_1" value="subject:findGenesBySymbol" />
		<entry key="emory_PPI_analysis:gene_symbol_2" value="subject:findGenesBySymbol" />
		<entry key="emory_PPI_analysis:assay_type" value="evidence:readString:createObservedLabelEvidence" />
		<entry key="emory_PPI_analysis:number_of_measurements" value="evidence:readInt:createObservedNumericEvidence" />
		<entry key="emory_PPI_analysis:average_fold_over_control_value" value="evidence:readDouble:createObservedNumericEvidence" />
		<entry key="emory_PPI_analysis:p_value" value="evidence:readDouble:createObservedNumericEvidence" />
		<entry key="emory_PPI_analysis:nci_portal" value="evidence:readString:createObservedUrlEvidence" />

For each column header in the Emory university submission template, there is an entry in the observationTemplateMap. The key is a combination of the submission template name and column header. The value is a combination of the following attributes:

  • Submission Attribute Type: The type of the submission attribute, either 'evidence' or 'subject'.

If the submission attribute type is 'subject':

  • DashboardDao Method: The DashboardDao method used to find the subject in the database. Typically one of the following:
  • findCompoundsByName
  • findTissueSampleByName
  • findGenesByEntrezId
  • findGenesBySymbol
  • findSubjectsBySynonym (used to find cell lines)
  • findAnimalModelByName

If the submission attribute type is 'evidence':

  • Evidence Read Method: Either readString, readDouble, readInt.
  • Evidence Constructor: The name of the method used to create the observed evidence entry in the database. The following methods are supported: createObservedLabelEvidence, createObservedNumericEvidence, createObservedFileEvidence, and createObservedUrlEvidence.

Admin Tool Usage

The admin tool is a command line java application. After building the admin tool from the source distribution, dashboard-admin.jar can be found within $HIPC_HOME/admin/target. A list of commands that are recognized by the admin tool can be found by running the following command:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -h

The following commands are recognized by the admin tool:

Import Animal Model Data (am)

This command is used to import animal model data subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -am

Import Cell Line Data (cl)

This command is used to import cell line subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -cl

Import Compound Data (cp)

This command is used to import compound subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -cp

Import Submission Metadata (cv)

This command is used to import submission metadata.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -cv

Import Gene Data (g)

This command is used to import gene subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -g

Index (i)

This command is used to create a lucene index for free text searching.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -i

Rank (r)

This command is used to rank subjects based on their observations (pre-processing for web site)

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -r

Import Submission Data (o)

This command is used to import submission data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -o

Import Protein Data (p)

This command is used to import protein subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -p

Import shRNA Data (sh)

This command is used to import shRNA subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -sh

Import siRNA Data (si)

This command is used to import siRNA subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -si

Import Taxonomy Data (t)

This command is used to import taxonomy data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -t

Import Tissue Sample Data (ts)

This command is used to import tissue sample subject data.

Example usage:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -ts

Use Case

In a typical dashboard database build, the following sequence of commands would be followed:

$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -t
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -am
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -cl
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -ts
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -cp
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -g
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -p
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -sh
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -si
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -cv
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -o
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -i
$JAVA_HOME/bin/java -jar $HIPC_HOME/admin/target/dashboard-admin.jar -r