-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DeepVision Port] SegFormer and Mix-Transformers #1946
Conversation
Just to update you @ianstenbit - the port is going well, but it took me a bit longer than anticipated to get used to Keras Core + the new API 😅 I ran into a small blocker and documented it here since I'm not sure what the intended usage is when returning tensors and non-tensors from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David! I'm taking a look at the non-tensor return issue.
We have a few options for workarounds, including:
- Computing the shape outside of the layer (because from my cursory view the returned shape is just the input shape / stride)
- Making the PatchingEmbeddingLayer offer a new method which computes these values which callers can use.
That said, I think we should be able to make this work. I'm taking a look at your issue on Keras Core to see if I can get a working fix.
keras_cv/models/backbones/mix_transformer/mix_transformer_backbone.py
Outdated
Show resolved
Hide resolved
Thanks! I opened it as an issue since I'm not sure if it's the intended usage. If so, I'd go with computing the values outside the layer/with an extra method |
keras_cv/models/backbones/mix_transformer/mix_transformer_backbone_presets.py
Show resolved
Hide resolved
class MiTBackbone(Backbone): | ||
def __init__( | ||
self, | ||
input_shape=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to (None, None, 3)
so that channel dims can be known at build time for conv layers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll have to default to (224, 224, 3)
actually, since the input shape will have to be known at instantiation time
@ianstenbit looks like MiTs are shaped up. Here's a demo notebook showcasing the components, inputs/output shapes, pyramid levels, There are a couple of weird-looking 99% of the work are MiTs - SegFormer is just MiT+seg top. Could you please review the backbone while I shape up SegFormers? With a green light, I'll write up the unit tests and add proper docstrings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good! Left you a few minor comments.
keras_cv/models/backbones/mix_transformer/mix_transformer_backbone.py
Outdated
Show resolved
Hide resolved
keras_cv/models/backbones/mix_transformer/mix_transformer_backbone.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few things to clean up -- in the meantime I am seeing if the tests need any fixing.
class SegFormerB0(SegFormer): | ||
def __new__( | ||
cls, | ||
num_classes=19, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't specify a default for num_classes
as it needs to be user-specified. A silent default could be very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Should it be requested as a mandatory arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think this should be required at init time.
/gcbrun |
Looks like tests are passing locally, but on CI (for TF) it will depend on us getting a new release of Keras Core which includes keras-team/keras-core#722 In the meantime, @DavidLandup0 I left a few review comments for you to take a look at -- thanks! |
Awesome, thanks! Getting to these soon. Thanks for the review pass! :) |
keras_cv/models/backbones/mix_transformer/mix_transformer_backbone.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you David!
I think this PR is basically all set, I just need to merge #2037 to fix CI
/gcbrun |
Awesome, thank you for the help in the final stretch! @ianstenbit 🎉 |
It looks like the GCB failure is because I need to update the Docker image of our GCB runners to use the newest Keras Core version -- doing that now. |
CI failures are unrelated -- seems like DeepLab + YOLOV8 have some breakages with the latest Keras Core version. I'll open a separate PR for those. |
Need a hand with DLV3 or YOLO? |
You're welcome to look if you'd like -- for DeepLab it's a deserialization issue. Haven't looked at YOLO yet. You can repro by installing latest Keras Core version and running the large tests of those models with TF backend. |
Sure! Sign me up for YOLO if it's not too urgent then :) |
* initial dump * add all basic layers, port roughly to keras core ops * updated .gitignore * segformer head and formatting * cleanup * remove tf call * remove tf * migrating to more keras ops * cleanups and fixes * fix reshaping * comments * from presets api, keras.ops -> ops * embed_dims -> embedding_dims * addressing some PR comments * docstrings, argument update * depths arg * sync * compute output shapes * segformer progress * head * softmax * remove softmax * undo compute_output_shapes() * efficientmultiheadattention -> segformermultiheadattention * docstrings * softmax output * segformer presets * updating segformer presets * segformer presets * import aliases * refactoring * pr comments * pr comments * add aliases * aliases ot init * refactor fix * import keras_cv_export * fix presets/aliases and add copyright * linter warnings * linter errors * consistency in presets * return config * fix serialization * Some cleanup + more tests * Fix DropPath layer (need to update tests + add shim for tf.keras * Finish DropPath layer * Use static shape in backbone * Formatting * Switch back to ops.shape * documentation * documentation * remove default num classes * fix docs --------- Co-authored-by: ianjjohnson <[email protected]>
What does this PR do?
As discussed in #1933 - setting up a draft PR for porting SegFormer and associated layers into KCV. Draft PR for now with placeholder main model dump, layers and tests incoming soon. Will tag once ready for review.
Demo Notebooks
from_preset()
usage and training: https://colab.research.google.com/drive/1Q3m9-LKICrFzuUhVMIPd7pY2l9Z3BLhg?usp=sharingfrom_preset()
usage and training: #TODOQuestions and API Considerations
SegFormerMultiHeadAttention
sounds like a mouthful.backbone
argument? JustSegFormer.from_preset()
?Before submitting
Pull Request section?
to it if that's the case. Porting DeepVision into KerasCV #1933
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ianstenbit just tagging so you can follow the progress as it comes in. Otherwise, no need to spend time until it's un-drafted for review :)