Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/feature subsetting #504

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions clustering/feature-subsetting-tool/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[bumpversion]
current_version = 0.2.1-dev0
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<dev>\d+))?
serialize =
{major}.{minor}.{patch}-{release}{dev}
{major}.{minor}.{patch}

[bumpversion:part:release]
optional_value = _
first_value = dev
values =
dev
_

[bumpversion:part:dev]

[bumpversion:file:pyproject.toml]
search = version = "{current_version}"
replace = version = "{new_version}"

[bumpversion:file:plugin.json]

[bumpversion:file:README.md]

[bumpversion:file:ict.yaml]

[bumpversion:file:FeatureSubsetting.cwl]

[bumpversion:file:VERSION]

[bumpversion:file:src/polus/images/clustering/feature_subsetting/__init__.py]
21 changes: 21 additions & 0 deletions clustering/feature-subsetting-tool/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM polusai/bfio:2.3.6

# environment variables defined in polusai/bfio
ENV EXEC_DIR="/opt/executables"
ENV POLUS_IMG_EXT=".ome.tif"
ENV POLUS_TAB_EXT=".csv"
ENV POLUS_LOG="INFO"

# Work directory defined in the base container
WORKDIR ${EXEC_DIR}

COPY pyproject.toml ${EXEC_DIR}
COPY VERSION ${EXEC_DIR}
COPY README.md ${EXEC_DIR}
COPY src ${EXEC_DIR}/src

RUN pip3 install ${EXEC_DIR} --no-cache-dir


ENTRYPOINT ["python3", "-m", "polus.images.clustering.feature_subsetting"]
CMD ["--help"]
68 changes: 68 additions & 0 deletions clustering/feature-subsetting-tool/FeatureSubsetting.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
class: CommandLineTool
cwlVersion: v1.2
inputs:
filePattern:
inputBinding:
prefix: --filePattern
type: string
groupVa:
inputBinding:
prefix: --groupVa
type: string
imageFeature:
inputBinding:
prefix: --imageFeature
type: string
inpDir:
inputBinding:
prefix: --inpDir
type: Directory
outDir:
inputBinding:
prefix: --outDir
type: Directory
padding:
inputBinding:
prefix: --padding
type: string?
percentile:
inputBinding:
prefix: --percentile
type: double
preview:
inputBinding:
prefix: --preview
type: boolean?
removeDirection:
inputBinding:
prefix: --removeDirection
type: string?
sectionVar:
inputBinding:
prefix: --sectionVar
type: string?
tabularDir:
inputBinding:
prefix: --tabularDir
type: Directory
tabularFeature:
inputBinding:
prefix: --tabularFeature
type: string
writeOutput:
inputBinding:
prefix: --writeOutput
type: boolean?
outputs:
outDir:
outputBinding:
glob: $(inputs.outDir.basename)
type: Directory
requirements:
DockerRequirement:
dockerPull: polusai/feature-subsetting-tool:0.2.1-dev0
InitialWorkDirRequirement:
listing:
- entry: $(inputs.outDir)
writable: true
InlineJavascriptRequirement: {}
58 changes: 58 additions & 0 deletions clustering/feature-subsetting-tool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Feature Data Subset(0.2.1-dev0)

This WIPP plugin subsets data based on a given feature. It works in conjunction with the `polus-feature-extraction-plugin`, where the feature extraction plugin can be used to extract the features such as the mean intensity of every image in the input image collection.

# Usage
The details and usage of the plugin inputs is provided in the section below. In addition to the subsetted data, the output directory also consists of a `summary.txt` file which has information as to what images were kept and their new filename if they were renamed.

### Explanation of inputs
Some of the inputs are pretty straighforward and are used commonly across most WIPP plugins. This section is used to provide some details and examples of the inputs that may be a little complicated. The image collection with the following pattern will be used as an example : `r{r+}_t{t+}_p{p+}_z{z+}_c{c+}.ome.tif`, where r,t,p,z,c stand for replicate, timepoint, positon,z-positon, and channel respectively. Consider we have 5 replicates, 3 timepoints, 50 positions, 10 z-planes and 4 channels.

1. `inpDir` - This contains the path to the input image collection to subset data from.
2. `tabularDir` This contains the path to the tabular files with file formats (`.csv`, `.arrow`, `.parquet`) containing the feature values for each image. This can be the output of the feature extraction or nyxus plugin
3. `filePattern` - Filepattern of the input images
4. `imageFeature` - Tabular data featuring image filenames
5. `tabularFeature` - Tabular feature that will be used to filter images
6. `groupVar` - This is a mandatory input across which to subset data. This can take either 1 or 2 variables as input and if 2 variables are provided then the second variable will be treated as the minor grouping variable. In our example, if the `z` is provided as input, then within a subcollection, the mean of the feature value will be taken for all images with the same z. Then the z positions will be filtered out based on the input of `percentile` and `removeDirection` variables. Now if `z,c` are provided as input, then 'c' will be treated as the minor grouping variable which means that the mean will be taken for all images with the same z for each channel. Also, the plugin will ensures that the same values of z positions are filtered out across c.
7. `percentile` and `removeDirection` - These two variables denote the critieria with which images are filtered. For example, if percentile is `0.1` and removeDirection is set to `Below` then images with feature value below the 10th percentile will be removed. On the other hand, if removeDirection is set to above then all images with feature value greater than the 10th pecentile will be removed. This enables data subsetting from both `brightfield` and `darkfield` microscopy images.

**Optional Arguments**

8. `sectionVar` - This is an optional input to segregate the input image collection into sub-collections. The analysis will be done seperately for each sub-collection. In our example, if the user enters `r,t` as the sectionVar, then we will have 15 subcollections (5*3),1 for each combination of timepoint and replicate. If the user enters `r` as sectionVar, then we will have 5 sub collections, 1 for each replicate. If the user wants to consider the whole image collection as a single section, then no input is required. NOTE: As a post processing step, same number of images will be subsetted across different sections.
9. `padding` - This is an optional variable with default value of 0. A delay of 3 means that 3 additional planes will captured on either side of the subsetted data. This can be used as a sanity check to ensure that the subsetted data captures the images we want. For example, in our examples if the following z values were filtered out intitially - 5,6,7 ; then a delay of 3 means that the output dataset will have z positions 2,3,4,5,6,7,8,9,10 if all them exist.
10. `writeOutput` - This is an optional argument with default value `True`. If it is set to true, then both the output image collection and `summary.txt` file will be created. If it is set to false, then the output directory will only consist of summary.txt. This option enables the user to tune the hyperparameters such as percentile, removeDirecton, feature without actually creating the output image collection.



Contact [Gauhar Bains](mailto:[email protected]) for more information.

For more information on WIPP, visit the [official WIPP page](https://isg.nist.gov/deepzoomweb/software/wipp).

## Building

To build the Docker image for the conversion plugin, run
`./build-docker.sh`.

## Install WIPP Plugin

If WIPP is running, navigate to the plugins page and add a new plugin. Paste the contents of `plugin.json` into the pop-up window and submit.

## Options

This plugin takes eleven input arguments and one output argument:

| Name | Description | I/O | Type |
| ------------------- | ----------------------------------------------------- | ------ | ------------- |
| `--inpDir` | Input image collection to be processed by this plugin | Input | collection |
| `--tabularDir` | Path to tabular data | Input | genericData |
| `--filePattern` | Filename pattern used to separate data | Input | string |
| `--imageFeature` | Feature in tabular data with image filenames | Input | string |
| `--tabularFeature` | Tabular feature to filter image files | Input | string |
| `--padding` | Number of images to capture outside the cutoff | Input | integer |
| `--groupVar` | variables to group by in a section | Input | string |
| `--percentile` | Percentile to remove | Input | float |
| `--removeDirection` | remove direction above or below percentile | Input | string |
| `--sectionVar` | variables to divide larger sections | Input | string |
| `--writeOutput` | write output image collection or not | Input | boolean |
| `--outDir` | Output collection | Output | genericData |
| `--preview` | Generate a JSON file with outputs | Output | JSON |
1 change: 1 addition & 0 deletions clustering/feature-subsetting-tool/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.2.1-dev0
4 changes: 4 additions & 0 deletions clustering/feature-subsetting-tool/build-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash

version=$(<VERSION)
docker build . -t polusai/feature-subsetting-tool:${version}
14 changes: 14 additions & 0 deletions clustering/feature-subsetting-tool/example/summary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
------------------------------------------------

Files :

x00_y01_p03_c1.ome.tif -----> x00_y01_p01_c1.ome.tif
x00_y01_p03_c2.ome.tif -----> x00_y01_p01_c2.ome.tif
x00_y01_p03_c3.ome.tif -----> x00_y01_p01_c3.ome.tif
x00_y01_p03_c4.ome.tif -----> x00_y01_p01_c4.ome.tif
x00_y01_p03_c5.ome.tif -----> x00_y01_p01_c5.ome.tif
x00_y01_p04_c1.ome.tif -----> x00_y01_p02_c1.ome.tif
x00_y01_p04_c2.ome.tif -----> x00_y01_p02_c2.ome.tif
x00_y01_p04_c3.ome.tif -----> x00_y01_p02_c3.ome.tif
x00_y01_p04_c4.ome.tif -----> x00_y01_p02_c4.ome.tif
x00_y01_p04_c5.ome.tif -----> x00_y01_p02_c5.ome.tif
141 changes: 141 additions & 0 deletions clustering/feature-subsetting-tool/ict.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
author:
- Gauhar Bains
contact: [email protected]
container: polusai/feature-subsetting-tool:0.2.1-dev0
description: Subset data using a given feature.
entrypoint: python3 -m polus.images.clustering.feature_subsetting
inputs:
- description: Input image directory
format:
- collection
name: inpDir
required: true
type: path
- description: Path to directory containing tabular data
format:
- genericData
name: tabularDir
required: true
type: path
- description: Filename pattern used to separate data.
format:
- string
name: filePattern
required: true
type: string
- description: Feature in tabular data containing image filenames.
format:
- string
name: imageFeature
required: true
type: string
- description: Feature in tabular data to subset image data
format:
- string
name: tabularFeature
required: true
type: string
- description: Number of images to capture outside the cutoff.
format:
- integer
name: padding
required: false
type: number
- description: variables to group by in a section.
format:
- string
name: groupVar
required: true
type: string
- description: Percentile to remove.
format:
- number
name: percentile
required: true
type: number
- description: Remove direction above or below percentile
format:
- string
name: removeDirection
required: false
type: string
- description: Variables to divide larger sections.
format:
- string
name: sectionVar
required: false
type: string
- description: Write output image collection or not.
format:
- boolean
name: writeOutput
required: false
type: boolean
- description: Generate an output preview
format:
- boolean
name: preview
required: false
type: boolean
name: polusai/FeatureSubsetting
outputs:
- description: Output collection
format:
- genericData
name: outDir
required: true
type: path
repository: https://github.com/PolusAI/image-tools
specVersion: 1.0.0
title: Feature Subsetting
ui:
- description: Path to Input image directory
key: inputs.inpDir
title: inpDir
type: path
- description: Input tabular directory
key: inputs.tabularDir
title: tabularDir
type: path
- description: A filepattern, used to select data for conversion
key: inputs.filePattern
title: filepattern
type: text
- description: Feature in tabular data containing image filenames
key: inputs.imageFeature
title: imageFeature
type: text
- description: Feature in tabular data to subset image data.
key: inputs.tabularFeature
title: tabularFeature
type: text
- description: Number of images to capture outside the cutoff.
key: inputs.padding
title: padding
type: number
- description: Variables to group by in a section.
key: inputs.groupVar
title: groupVar
type: text
- description: Percentile to remove.
key: inputs.percentile
title: percentile
type: number
- description: Remove direction above or below percentile.
key: inputs.removeDirection
title: removeDirection
type: text
- description: Variables to divide larger sections.
key: inputs.sectionVar
title: sectionVar
type: text
- description: Write output image collection or not.
key: inputs.writeOutput
title: writeOutput
type: checkbox
- default: false
description: Generate an output preview.
key: inputs.preview
title: preview
type: checkbox
version: 0.2.1-dev0
16 changes: 16 additions & 0 deletions clustering/feature-subsetting-tool/package-release.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# This script is designed to help package a new version of a plugin

# Get the new version
version=$(<VERSION)

# Bump the version
bump2version --config-file bumpversion.cfg --new-version ${version} --allow-dirty part

# Build the container
./build-docker.sh

# Push to dockerhub
docker push polusai/feature-subsetting-tool:${version}

# Run pytests
python -m pytest -s tests
Loading
Loading