Skip to content

Latest commit

 

History

History
278 lines (232 loc) · 19.3 KB

progress.md

File metadata and controls

278 lines (232 loc) · 19.3 KB

Progress Log

Week 1

  • Created image-subregion-extractor for quickly saving sub-regions from images to create training and test sets [Scott]
  • Start TensorFlow image detection pipeline [Ben]
  • Blob detection using differential operators [Lina]
  • Kingshuk came as guest - made following suggestions
    • Use full use of biological knowledge including 3D if available
    • Watershed is difficult to make work well in his experience

Group meeting actions

  • Use systematic approach to pipeline stage evaluation [Lina]
  • To prepare tutorial on TensorFlow (Ben)
  • Evaluate classification pipeline (Scott)

Week 2

  • Update on marker spreadsheet by Susan
  • Revisions to blob detection pipeline [Lina]
  • Cascade classifier performs very poorly (Scott)
  • Created image-subregion-detector for real-time image segmentation (Scott) Detected
  • Created mapping of symbolic color names to HSV space (Scott)
  • Presented TensorFlow demo (Ben)

Group meeting actions

  • Develop list of segmentation targets and characteristics [Lina]
  • Manual feature selection/reduction for classification of acinar tubules [Lina]
  • Implement digit recognition from Google Street View pipeline for TensorFlow (Ben)
  • Add features to image-subregion-detector (Scott)
    • API for region detection pipeline plugins
    • Accept/Reject detected image regions
    • Consider how to integrate image metadata to improve region detection
    • Consider selection of positive and negative regions to Start
    • Consider iterative improvement of detected regions (reinforcement learning)

Week 3

  • Updates to image-subregion-detector (Scott)
    • Expanded definition for 'black' HSV color range
    • Added functionality to specify (multiple) background color ranges
    • Remove detected sub-regions by right-clicking on them
    • BUGFIXES: properly clear user-drawn rects, support 16-bit RGB TIFs (by downsampling to 8-bit)
  • Explored segmentation targets (cells and anatomical structures as below) and their characteristics (Lina) (brief descriptions are in this pdf )
    • Pericytes
    • Club cells
    • Ciliated cells
    • Bronchioles
    • Acinar tubules (some thoughts on acinar tubule criteria)
  • Continuing research into the use of machine learning algorithms implemented via TensorFlow (Ben)
    • If we were to build a training set (Scott's tool), we'd have about one-tenth the amount of usual machine learning training data
    • Currnetly, implementing the digit recognition algorithm with low sample counts to simulate our situation to get a feel for "deterioration" of accurarcy with low training sets
    • Spiking on the following machine learning techniques in TensorFlow:
      • Covnets
      • Regularization
      • Dropout
      • ReLUs (Rectified Linear Units)

Week 4

  • More updates to image-subregion-detector (Scott)
    • [Sub-region detector presentation](./scott/Sub-region Detector Algo Summary.pdf)
    • Added interactive preview image for easier navigation around the main canvas
    • Added some stats about the detected regions (count, min/max/avg size)
    • Display color range % for user drawn rectangle (helps choose appropriate bg colors)
    • Better region detection for single cells (via new pre-erosion option)
    • BUGFIXES:
      • Clear sub-regions when selecting a new image
      • Fix typo in maximum area label
      • Fix error thrown if zero regions are detected
    • New screenshot: Detected_week4
  • Built two neural network algorithms (TensorFlow) to examine impact of sample size on accuracy (Ben)
  • Literature reading, summarized potential useful features that may facilitate structure classification, and identified blood vessels(Lina)

Week of Oct 3rd

  • Created script to parse sub-region detector output (Scott)
    • Output is a JSON file compatible with TensorBox
    • Also created a Jupyter notebook to verify the location of the JSON bounding boxes

Week of Oct 10th

  • Extracted sub-regions for acinar tubles from experiment 73 - all 20x images at E16.5 (Scott)
    • exp 73 has labelled proteins: Acta2 (white), Sftpc (red), and Sox9 (green)
    • extracted distal acinar tubules as contours using the detector
    • extracted proximal acinar tubules as sub-regions using the extractor
    • extracted acinar tubule negative set of sub-regions using the extractor
  • Began looking at other experiments with 20x E16.5 images (Scott)
    • Focused on experiment 41 - has labelled proteins: Sox2 (green), Nkx2-1 (red), Acta2 (white)
    • Noticed the red probe was different than the red used in exp 73
    • For red, exp 73's Sftpc used Alexa Fluor 568 and exp 41's Nkx2-1 used Alexa Fluor 555
    • Is there any way to get the fluorophore from the API?
  • Aligned first pipeline from lungmap images to Tensorbox. (Ben)
    • The results seem promising, so will continue to examine this tool for creating one algorithm capable of segmenting multiple anatomical objects within one image. However, to make this goal real, many iterations of models will need to be examined and experimented with. To help keep track of all algorithms, a separate repository (just for machine learning algorithms) is now up and available for review here. This next week, we will plan to:
    • Re-run the first model with 10,000,000 iterations (the default setting for this pipeline)
    • This means we will need to set-up a VM to host this compute, so will work on getting that infrastructure set up.
    • Train a new model that only considers the grayscale versions of the images. Allowing us to experiment with both possibilites that computer vision doesn't need the stains and/or building more complicated algorithms that first identify general anatomical structures (i.e. acinar tubule) that then feed into other algorithms that distinguish between distal and proximal.
  • Began extracting sub-regions for blood vessels at all developmental stages (Lina)
  • Added functions for image features (Lina)
    • perimeter_area ratio (may be useful for distinguishing bronchioles from blood vessels)
    • entropy (needed to be improved(return a single value from a sub-region instead of returning a value for each pixel))

Summary for August 2016

Software

Algorithms

  • Investigation of Google's TensorFlow library for image classification using deep learning. Initial impression is that TensorFlow applications may require a larger training set than we have in order to give accurate classification, but this is still being looked into.
  • [Rapid two-stage image segmentation pipeline](./scott/Sub-region Detector Algo Summary.pdf) based on blob detection by immunofluorescent color foreground/background partitioning and blob classification by size constraints developed as prototype engine for image-subregion-detector.
  • [Review of literature](https://github.com/duke-lungmap-team/lungmap-scratch/tree/master/Lina/Commonly used features for analyzing Histology Images.pdf) for statistical features (color, texture, morphology and architecture) useful for classifying histology images

Knowledge Base

Summary for September 2016

Software

  • Added ability of image-subregion-extractor to capture bit masks as numpy arrays
  • Built plug-play-algorithm-app to evaluate object recognition algorithms as plug-ins (successful demo using Haar cascade classifier plug-in for real-time face recognition)
  • Script to run TensorBox training and evaluation completed - worked out issues with specifying training sets and evaluated on test data
  • Script to extract bounding boxes for extracted images in training set in format required for TensorBox

Algorithms

  • Use of TensorBox for object detection and segmentation

Summary for October 2016

  • Complete code for extracting image segments and bounding boxes
  • Extract training sets for acinar tubules and bronchioles

Image classification Progress

  • Move to label-free system for robust classification
  • Initially used manual feature extraction
  • Now use wndcharm for extraction
  • Over 4000 features per image
  • Initial evaluation on 3 classes extracted by Scott (background, proxmial and distal acinar tubules)
  • Use of sklearn to construct pipelines for classification
    • over 95% cross-validation accuracy seen with classifiers out-of-box (almost no tuning done)
    • See Classification.ipynb notebook in cliburn folder for code

Running objectives

  1. Heuristic algorithm to extract sub-images for training set (stage 1: create blobs from feature colors, stage 2: filter for blobs that are similar to exemplar)
  • Graphical user interface
  • Object recognition using single exemplar
  • Export images to numpy arrays as training sets
  • Create JSON file with bounding boxes for target locations
  • Object recognition using multiple exemplars
  1. Build positive and negative training sets for anatomical objects
  1. Evaluate summary image features for classification
  • Define list of features for evaluation on training sets
  • Evaluate performance of individual features for classification accuracy
  • Evaluate performance of combined features for classification accuracy
  • Integrate features found into stage 2 of image extractor
  1. Evaluate deep learning for in-image object recognition and segmentation
  • Evaluate how TensorFlow library works
  • Graphical user interface to plug-in algorithms
  • Train and test on standard data sets
  • Train and test on LungMAP IHC positive and negative training sets
  1. Construct a formal knowledge base of interesting anatomical structures and cells and their statistical features in the developing mouse lung (with Anna Maria)
  • Create a table with rows containing (name, stage, feature, measurement, statistic, value) e.g. (proximal tubule, E16.5, SOCS-9, area, min, 20 $\mu$m)
  • Use of knowledge base to provide sensible default parameters (e.g. # erosions) for known targets so as to increase sensitivity of blob detection (stage 1)
  • Use of knowledge base to create filters based on feature statistics so as to increase specificity of blob classification (stage 2)
  1. Explore patterns with statistical analysis of discovered image segments (with Kingshuk)
  • Cross-sectional analysis of counts and distributions
  • Longitudinal analysis of counts and distributions

Summary for December 2016

  • Improved candidate search by using both the hue and saturation channels of the HSV image (Scott)

Week of Jan 16th

  • Began investigating replacing wnd-charm with custom features with good results & is much faster (Scott)

Week of Jan 23rd

  • Augmented image training set of bronchioles and blood vessels (Lina)
    • Method 1: Rotated image in 90\degree, 180\degree, 270\degree.
    • Method 2: Random transformed (shift, rescale, shear, zoom, flip, rotation) by making use of features in Keras.
  • Trained classifier with the augmented training set.
  • Replaced wnd-charm in the Tkinter identifier application, much faster now & seems just as accurate (Scott)
  • Began work on web version of the image sub-region identifier app (Scott)

Week of Jan 30th

  • Finished working prototype of web identifier to be feature complete with the Tkinter version (Scott)

Week of Feb 6th

  • Began investigating custom features to better differentiate bronchioles from the "open" blood vessels in the E16.5 mouse images (Scott)

    • One possibility is to incorporate categorical features: Cliburn recommended using the OneHotEncoder from sklearn, which looks promising
  • Dockerized a version of the wa-isit app. Previously this app was deployed using flask's default web server. As we saw during a live test, and as documented in Flask, this was not a good idea. Took time to update the docker container to serve the application using uwsgi, so the live demo is much more robust. (Ben)

    • working on porting (to Angular) a prototype web application (goldmill) that will allow us to obtain better training data and make better use of the LungMap ontology. (Ben)

Week of Feb 13th

  • Drafted a roadmap for delivery of final product to be build over the next year (Scott)

Week of Feb 20th

  • Began investigating improving the training set using polygon segmentations of target regions (Scott)
  • Began investigating the potential criteria to exclude "outliers" (Lina)
    • "Outliers" are sub-regions that do not belong to any anatomical structural classes in the training set but forced to be one of them by our classifier.
    • Methods that I am investigating include distance based methods and kernel based novelty detection.

Update prior to Mar 1st meeting

  • Investigating polygon training set (Scott)

    • Finished a working version of the poly extractor
      • Nice feature of the tool is that it extracts and saves both the bounding rectangle and the grayscale polygon mask, so it is easy to compare methods using the same training set on either the masked or original rectangular regions.
    • Used the poly extractor to re-create training data for experiment 73
      • Segmented 426 regions from the 4 images...yeah, this is tedious work and takes a while!
    • Updated lung-map-utils to take optional user-specified mask and optional suffix for sig file names.
    • Updated notebook for generating signature files
    • Created notebook comparing accuracy of custom features vs custom masked features vs wndcharm
    • TODO: If we want continue with the masked region strategy, the next step is to solve the contour filling issue with the candidates.
  • Researching application technology stack (Ben)

    • Need an ORM - spiked on Django, SQLAlchemy. Ready to begin data modeling phase for application.
    • Need Javascript Framework to help interact with DOM (using Angular v1). Will plan to have a sketch of training creator frontend.
    • Spiked on docker-compose. Will plan to build out the infrastructure via docker by next week.
    • Spiked on authentication via restful python framework. Ready to begin building out endpoints.
  • Investigating outlier detector (Lina)

    • Used the "rectangular" training sub-regions with Scott's customized features to train the classifier.
    • Predicted the class of the sub-region candidates.
    • Did Z normalization using the mean and standard deviation from their predicted class.
    • Counted the number of features that do not fall in the 3 standard deviations range (i.e. (-3, 3) after normalization).
    • Planned to investigate other criteria that may be helpful for detecting outliers.

Update prior to Mar 8th meeting

  • Began solving contour filling issue with candidates on the boundary of an image (Scott)

Update prior to Mar 15th meeting

  • Solved contour filling for boundary candidates, in the process of updating the processing pipeline (Scott)
  • Spiked on ImageField in Django and prototyped adding images to the database while at the same time calculating a SHA1 hash and deriving JPEG images. (Ben)

Update prior to Mar 22nd meeting

  • Refactored prototype django app to align better with the django-rest-framework (remove admin) and also remodeled the data model so that the LungmapImage table needs to be populated within the app and not through fixtures. (Ben)
  • Started to map out the RESTful API interface, will need feedback. (Ben)
  • Used PCA to visualize if the randomly picked sub-regions(polygon shaped) belong to a class. (Lina)
  • Started to create a function to calculate posterior probability for ridge regression. (Lina)

Update prior to Mar 29nd meeting

  • Created a function to calculate posterior probability for multinomial logistic regression. (Lina)
  • Processing pipeline successfully updated to handle polygon regions (Scott)
    • created notebook for comparing polygon vs rectangular sub-regions
  • Reviewed and merged PR from Ben for the Django app (Scott)
    • Experiment & Probe models created

Update prior to Apr 5th meeting

  • Updated REST API for POSTing a new experiment (Scott)
  • Received prototype ontology from Anna Maria, having trouble parsing the relationships (Scott))