Skip to content

Latest commit

 

History

History
70 lines (69 loc) · 6.46 KB

Definitions.md

File metadata and controls

70 lines (69 loc) · 6.46 KB

4.3. Definitions

4.3.1. Identification Definitions

  • Method[String]: This is the staining method. Either:
    • Manual: manually stained
    • Automatic: stained by an automatic stainer (E.g. Lecia Bond Rx)
  • Panel[String]: For each unique set of antibodies we have provided an idenitifying name.
    • E.g. PD1\PDL1 Axis.
  • Tissue[String]: The tissue the panel was stained on
  • Machine[string]: The user defined name of the microscope the machine was scanned on.
    • This specifier is used to find particular files needed for processing a of specified microscope, particularly the image warping parameters.
    • Traditionally the microscopes are named with a three letter location specifier, followed by the machine type (Polaris, Vectra3), then underscore and a numeric value.
    • E.g. JHUVectra3_1
  • StainConfig[int]: The stain configuration is a unique numeric value for a Machine, Method, Panel pairing.
  • Cohort[int]: A unique numeric identifier for a set of patient samples.
    • Cohorts can belong in multiple Projects if stained with different StainConfigs.
  • Project[int]: Defined as a set of slides, or Cohort, stained with a single Panel and Method then scanned by a single Machine.
    • These values are unqiue and are defined at the start of processing then used throughout the code to identify that project.
    • All data for a specified project should be placed in a single folder.
  • Batch[int]: The unique numeric value for a batch of slides stained together.
    • For each unique Project, each set of BatchIDs are separate and usually starts over at 1.
    • Additional details on BatchIDs can be found in 4.4.6..
  • Specimen #[string]s: These are the names defined on the physical slide.
    • Usually this name is a medical identifier than could be consider PPI. In order to keep HIPAA compliance the slides are renamed as soon as they are scanned.
    • See 4.4.2. and 4.4.3. for more details
  • SampleName[string]s: The SampleNames are the names defined during the scanning process
    • These names are replaced and standarized as part of the pipeline.
    • The code detects these from the SpecimenTable.xlsx files contained in each cohort scanning folder.
    • A description of the SpecimenTable.xlsx file is in 4.4.2..
  • SlideID[string]s: The names for the specimens in the astropath processing pipeline
    • These names replace the SampleNames on all corresponding files and inside the scanning plan, annotations.xml, files generated during the scanning process.
    • Using these names allows us to avoid outside the organization changes to naming conventions.
    • The IDs have the format; APpppXXXX
      • ppp indicates the numeric ProjectID
      • XXXX is a slide number which is unique within a project
    • The IDs are generated by comparing the AstropathAPIDdef_PP.csv to the cohort specific SpecimenTable.xlsx
      • we assign each new specimen with a new value in sequential ordering (AP0010001, AP0010002, AP0020001 …)
  • ScanNN[string]: The scan folder from the vectra mircoscopes. The scan folder is usually labeled ScanNN, where the NN stands for the scan try. This number is the scan that was successful and can be 1 or 2 digits. Further description can be found here.

4.3.2. Path Definitions

The file pathes have been standarized and are described below. Additional examples and directions for intialization of these paths can be found in 4.5.2.. A full file tree layout can be found in 4.6.2.

  • <Mpath>: The main path for all the astropath processing .csv configuration files.
    • The current location of this path is \\bki04\astropath_processing.
    • A description of each file is located in 4.5.1.
  • <Dname>: The data folder name or the name of the clinical specimen folder.
    • E.g. Clinical_Specimen_7
  • <Dpath>: The data or destination path up to but not including the <Dname>
    • such that the full path to the data is \\<Dpath>\<Dname>
  • <Spath>: The source path of the data up to but not including the <Dname>
    • such that the full source path of the data is \\<Spath>\<Dname>
    • usually indicates a subdirectory on a server where the microscope is backed up to after scanning is complete.
    • E.g. \\tme1\VectraPolaris\Vectra Polaris 1 Scanning
  • <Cpath>: The compression path of the data up to but not including the <Dname>
    • such that the full compression path of the data is \\<Cpath>\<Dname>
    • Here the compressed backup the im3 image files and final Tables\ component data tiffs are stored after the pipeline has finished processing.
    • E.g. \\bki03\Compressed_Clinical_Specimens
    • E.g. \\bki03\Compressed_Clinical_Specimens_2
  • <FWpath>: This is the full path for the single column flat field and warping image (.fw) as well as the exposure time data for each image (.SpectralBasisInfo.xml).
    • This path should preferably located on a different drive from the main path to improve pipeline performance.
    • E.g. "\bki03\flatw_7"
    • Usually the specifier used on the <Dname> folder is also found as an extension on this folder
    • Additional details on these files can be found in the flatw workflow description 5.7.3.2.
  • <Rpath>: This is the same as <FWpath> above, except with the distinction that the image files in the directory are single column raw image files (.Data.dat) instead of flat field and warping images (.fw).
  • <DZpath>: The path to the deepzoomed images
  • base[sting]: This is sometimes used throughout the documentation to refer to the combination of <Dpath>\<Dname> (both defined above)
  • <im3_path>: The path between the <SlideID> and where the original im3 are kept
    • E.g. im3\<ScanN>\MSI or im3\<ScanNN>\MSI
  • <flatw_im3_path>: The path after the <SlideID> where the image corrected im3s are kept
    • E.g. (im3\flatw)

NOTE: the <path> variables do not contain the <Dname>