Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retina encoder: biological encoder for vision WIP #691

Open
wants to merge 56 commits into
base: master
Choose a base branch
from

Conversation

breznak
Copy link
Member

@breznak breznak commented Sep 26, 2019

eye.py is a biological implementation of retina (encoder).
WIP

For #682

eye.py is a biological implementation of retina (encoder).
WIP
@breznak breznak added in_progress encoder research new functionality of HTM theory, research idea labels Sep 26, 2019
@breznak
Copy link
Member Author

breznak commented Sep 26, 2019

FYI @ctrl-z-9000-times if you have some insights, please share, I'll slowly try to work this

Copy link
Member Author

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steps:

  • make this run locally
    • add a unit test
    • validate the encode on MNIST classification task
  • fix dependencies (incl ChannelEncoder)
  • refactoring
    • should inherit from an Encoder, blocked by Provide bindings for BaseEncoder for python encoders #704
    • split Sensor (eye), Motor-Control? -> motor-control removed, eye accepts
      position,rotation,scale.
    • visualizations only optional
    • Parameters need work
    • split ChannelEncoder to a standalone file?
  • fix new stuff: better split parvo/magno cell coutns, ...
    • fix back the sparsity for 3D (3rd root)
    • fix sparsity ratio between P-M cells (3:1) - TODO I cannot find what the ratio is
      • ration is approx 8:1
      • replace mode:parvo/magno/both with a p_m_ratio? , or provide sparsityP, sparsityM?
    • resolve level of motion control? (x,y,rot,scale) on micro (=saccadic step)/macro (where to look at the scene) -> using micro/saccades for this PR

py/htm/encoders/eye.py Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
@breznak breznak closed this Sep 26, 2019
@breznak breznak reopened this Sep 26, 2019
@ctrl-z-9000-times
Copy link
Collaborator

Thank you Breznak for getting the ball rolling on this PR.

I did some basic cleanup. You should now be able to run this locally (without installing my old research repo).

The encoder is now run-able from the command line. It requires a single argument: the image file-path or directory.
$ python3 py/htm/encoders/eye.py ~/Pictures/

Copy link
Member Author

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the progress, main changes are

  • removed Sampler code

Open issues:

  • how to integrate n saccadic steps into the final image SDR? (logical AND?)
    • and related, output SDR seems too dense

bindings/py/packaging/setup.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
@breznak
Copy link
Member Author

breznak commented Oct 2, 2019

validate the encode on MNIST classification task

I'm running into an import problem, from htm.encoders.eye import Eye, which is strange as GridCellEncoder works fine.

@breznak
Copy link
Member Author

breznak commented Oct 22, 2019

Overall this looks good. I think it's close to good enough for this PR.

thank you for reviewing! Addressed some of your concerns, I have more cleanups in another PR, but I broke sth there..so it's probably a good idea to get this into a shippable shape and resolve, and then do followups.

@breznak
Copy link
Member Author

breznak commented Oct 22, 2019

TODOs:

@breznak
Copy link
Member Author

breznak commented Oct 23, 2019

logpolar transform vs. retinal log sampling:
opencv/opencv_contrib#2305

  • both achieve "highlight in fovea, background less important"
  • logpolar has the property that rotation,scale are represented as translations, thus have similar representations in encoder (overlap), but what about translation?
  • retinal log sampling is biologically plausible by the model.
  • both suffer that we (have to?) crop the image to ROI, and apply the transform there (with fovea in center of image/ROI), this cuts off the info outside of ROI, which would have been reduced anyway but continuously

Tl;DR: can we use Retina.useRetinalLogSampling instead of manual logpolar transform?

@ctrl-z-9000-times
Copy link
Collaborator

logpolar has the property that rotation,scale are represented as translations, thus have similar representations in encoder (overlap), but what about translation?

Yes. Assuming the motion is small, the eye's output should still have semantic similarity.

both suffer that we (have to?) crop the image to ROI, and apply the transform there (with fovea in center of image/ROI), this cuts off the info outside of ROI, which would have been reduced anyway but continuously.

No. The area outside of the ROI is lost, not reduced. The things outside of the ROI are outside of the eye's field of view. The peripheral vision needs to be included inside of the ROI.

@breznak
Copy link
Member Author

breznak commented Oct 23, 2019

The things outside of the ROI are outside of the eye's field of view. The peripheral vision needs to be included inside of the ROI.

ok, there might be some micommunication of terms on my side, but imagine this case:
"right now I'm reading on the notebook screen with the room in the background":

  • this is my field of view (FOV, about 160deg horizontally), that could be expressed as a photo taken from my position
  • my focus (fovea) is on the screen I read text from, which is only a tiny fration of the area on the picture
  • Q: ROI==FOV vs. ROI=="fovea"(screen)?

This should illustrate that even for ROI (as implemented, the area of image that gets processed, other gets lost), if:

  • ROI is FOV: we need to be able to specify position, diemeter of retinal fovea.
  • ROI is "fovea", I think this is a bad design,as it leads to either: too much high-details (whole room in fovea), or almost none peripheral vision (only the "screen" and corners that fit into its bounding box), in reality I am able to notice motion in a large area.

I think to summarize,

The peripheral vision needs to be included inside of the ROI.

if this is true, we need to be able to specify the ratio of fovea/peripheral better

@ctrl-z-9000-times
Copy link
Collaborator

The ROI is the entire field of view.
The fovea is a small area at the center of the ROI.

we need to be able to specify the ratio of fovea/peripheral better

This is one of the many tuning parameters, IIRC self.fovea_scale

Copy link
Member Author

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test this out,
esp. look at:

  • scaling, how to make it work
  • if custom log-polar can be replaced by Retina's

@@ -115,7 +120,10 @@ def main(parameters=default_parameters, argv=None, verbose=True):
# Training Loop
for i in range(len(train_images)):
img, lbl = random.choice(training_data)
encode(img, enc)
encoder.new_image(img)
(enc, _) = encoder.compute()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP on MNIST, not yet tuned. I should revert these changes for now.

self.retina_diameter = int(self.resolution_factor * output_diameter)
# Argument fovea_scale ... proportion of the image (ROI) which will be covered (seen) by
# high-res fovea (parvo pathway)
self.fovea_scale = 0.177
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have previously misinterpreted this and self.scale. Rename and change this to fovea_diameter to be clearer?

inputSize = (self.retina_diameter, self.retina_diameter),
colorMode = color,
colorSamplingMethod = cv2.bioinspired.RETINA_COLOR_BAYER,
useRetinaLogSampling = True,)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctrl-z-9000-times please compare with this on/off. Can it replace our manual log-polar transformation?

roi.resize( (self.retina_diameter, self.retina_diameter, 3))

# Mask out areas the eye can't see by drawing a circle boarder.
# this represents the "shape" of the sensor/eye (comment out to leave rectangural)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to crop to circular region here (and not only in the visualization)? Makes encoder see only ROI as the inner circle.


# apply field of view (FOV), rotation
self.roi = self._crop_roi()
self.roi = self.rotate_(self.roi, self.orientation)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply rotation to the image itself, instead of separately to output, visualizations, etc

py/htm/encoders/eye.py Outdated Show resolved Hide resolved
py/htm/encoders/eye.py Outdated Show resolved Hide resolved
where plot was broken with frational scaling.
Using cv2.resize() rather than numpy's roi.resize() fixes the issue
(numerical problems)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
encoder ready research new functionality of HTM theory, research idea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants