Skip to content

sampleInformation_wiki

tbrunetti edited this page Jul 14, 2020 · 16 revisions

Running method: sampleInformation

All possible command line arguments for method sampleInformation


Argument Defaults Description
--bpm required by user Full path to bead pool manifest file (.bpm); must be same one used to generate gtc
--gtcDir required by user Full path to location of directory/folder containing gtc files to process (files must end in .gtc) -- will not recursively go into sub-directories
--outDir optional, default=current working directory Full path to directory or folder to output results. If it path does not exist, program will attempt to create it
--logName optional, default=gtcFuncs.log Name of log file to output, will be created in directory --outDir
--modDir optional, default=current working directory Full path to module files .py from github; default is current working directory with modules folder appended
--prefix optional, default=None string prefix to name text files and image files, will be created in directory --outDir. If nothing is provided the following png files will be created and overwritten: callRatePlots.png, gc10Plots.png, logrDevPlots.png
--fileOutName optional, default=allSampleInfo.txt Name of final text file to output, will be created in directory --outDir
--recursive optional, flag if flag is specified, gtc files will be found recursively from the base --gtcDir, otherwise, only gtcs listed in --gtcDir will be used

The minimum command required to get sample information from files is the following:

python3 gtcFuncs.py sampleInformation --bpm /path/to/manifest.bpm --gtcDir /path/to/gtcLocations/

This would create two tab-delimited text files called allSampleInfo.txt and summaryStatsTable.txt using all gtcs located in the argument following --gtcDir in the current working directory, along with a log file called gtcFuncs.log in the current working directory.

Example Output

Below is an example of the output allSampleInfo.txt generated:
For the actual file, allSampleInfo.txt generated by sampleInformation, please download the text file here

Below is an example of the output summaryStatsTable.txt generated:

For the actual file, summaryStatsTable.txt generated by sampleInformation, please download the text file here

Additionally, sampleInformation generates a series of image files (.png) to summarize the extracted gtc metrics across the full set of samples requested. Below is an example of the summary figure generated of the call rate metric:

  • The first plot (left) is the call rate summarized across all samples and dots are color-coded by gtc listed sex
  • The second plot (middle) is of the same data in the first plot, except now it is also annotated with mean and standard deviation lines. The green line represents the mean, the blue line represents +/- 3 standard deviations from the mean, and the orange line represents +/- 6 standard deviations from the mean.
  • The third plot (right) removes samples that are more than 3 standard deviations from the mean and recalculates the box plot based off of the removed outliers.
  • The actual file can be viewed and downloaded here

Here is an example for the gc10 plot generation (same methods applied as above except now using gc10 instead of call rate):

Here is an example for the logrDev plot generation (same methods applied as above except now using logrDev instead of call rate):

Clone this wiki locally