New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

VideoSwin Backbone #1808

Open

ushareng wants to merge 8 commits into keras-team:keras-hub from ushareng:video_swin

Collaborator

ushareng commented Sep 4, 2024

No description provided.

divyashreepathihalli and others added 8 commits

August 12, 2024 17:17


          Add VGG16 backbone (keras-team#1737)

a089a8b

* Agg Vgg16 backbone

* update names

* update tests

* update test

* add image classifier

* incorporate review comments

* Update test case

* update backbone test

* add image classifier

* classifier cleanup

* code reformat

* add vgg16 image classifier

* make vgg generic

* update doc string

* update docstring

* add classifier test

* update tests

* update docstring

* address review comments

* code reformat

* update the configs

* address review comments

* fix task saved model test

* update init

* code reformatted


          Add ResNetBackbone and ResNetImageClassifier (keras-team#1765)

73b7bad

* Add ResNetV1 and ResNetV2

* Address comments


          Add CSP DarkNet backbone and classifier (keras-team#1774)

26afc7e

* Add CSP DarkNet

* Add CSP DarkNet

* snake_case function names

* change use_depthwise to block_type


          Add FeaturePyramidBackbone and port weights from timm for `ResNet…

00ab4d5

…Backbone` (keras-team#1769)

* Add FeaturePyramidBackbone and update ResNetBackbone

* Simplify the implementation

* Fix CI

* Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone

* Add conversion implementation

* Update docstrings

* Address comments


          Add DenseNet (keras-team#1775)

* Add DenseNet

* fix testcase

* address comments

* nit

* fix lint errors

* move description


          Merge remote-tracking branch 'upstream/master' into keras-hub

ececd14


          video_swin added

8b859b5


          code format fixed

c2a8550

Collaborator Author

ushareng commented Sep 4, 2024

The input variable names are not fixed yet. For image classifier backbones image_shape fits right, what should we keep it here video_shape or you want to go with generic one

Collaborator Author

ushareng commented Sep 4, 2024

Also, in the classifier I have used imageclassifier, which needs to be updated to video classifier
Shall I put that change in this PR or how?

ushareng requested a review from divyashreepathihalli

September 4, 2024 15:20

Member

mattdangerw commented Sep 4, 2024

We can ignore nightly failure, but please for ./shell/format.sh and ./shell/api_gen.sh to fix the linter errors.

Member

mattdangerw commented Sep 4, 2024

Also, in the classifier I have used imageclassifier, which needs to be updated to video classifier
Shall I put that change in this PR or how?

Any way is fine, but it might be easiest to start with the backbone, and just add the task as a follow up.

Can you include some colab showing that we have correct numerics? E.g. load some reference implementation, assign the weights over, show our implementation matches?

mattdangerw reviewed

View reviewed changes

Member

mattdangerw left a comment

Left some initial comments. Make sure to take time to check for the style of things in the repo and do some rewriting of the source code as we bring it over. We want to make sure this is a proper port that leaves us with a uniform UX for all models we have checked in.

keras_nlp/src/models/video_swin/video_swin_backbone.py

+                      include_rescaling=False,
+                      image_shape=(32, 224, 224, 3),
+                      embed_dim=96,
+                      patch_size=[2, 4, 4],

Member

mattdangerw Sep 4, 2024

We should probably call the arguments that vary per layer either layerwise_patch_size or stackwise_patch_size, etc, for consistency with other models.

keras_nlp/src/models/video_swin/video_swin_backbone.py

+                  def __init__(
+                      self,
+                      include_rescaling=False,
+                      image_shape=(32, 224, 224, 3),

Member

mattdangerw Sep 4, 2024

If this model is capable of taking in different numbers of frames or different image sizes, this should probably be (None, None, None, 3).

keras_nlp/src/models/video_swin/video_swin_backbone.py

+                  def __init__(
+                      self,
+                      include_rescaling=False,

Member

mattdangerw Sep 4, 2024

Remove defaults that are specific to a given model size. patch_size, window_size, depths and num_heads should all have no default here. The defaults will be in the individual preset configs.

keras_nlp/src/models/video_swin/video_swin_backbone.py

		):

		# === Functional Model ===

Member

mattdangerw Sep 4, 2024

remove empty newline

keras_nlp/src/models/video_swin/video_swin_backbone.py

+                      image_shape (tuple[int], optional): The size of the input video in
+                          `(depth, height, width, channel)` format.
+                          Defaults to `(32, 224, 224, 3)`.
+                      include_rescaling (bool, optional): Whether to rescale the inputs. If

Member

mattdangerw Sep 4, 2024

format lines to 80 characters. This looks like we are going over.

keras_nlp/src/models/video_swin/video_swin_layers.py

+                      self.embed_dim = embed_dim
+                      self.norm_layer = norm_layer
+                  def __compute_padding(self, dim, patch_size):

Member

mattdangerw Sep 4, 2024

Why no one underscore? Double underscore is usually reserved for python builtins

keras_nlp/src/models/video_swin/video_swin_layers.py

+                      **kwargs,
+                  ):
+                      super().__init__(**kwargs)
+                      # variables

Member

mattdangerw Sep 4, 2024

config? variables usually refers to trainable weights (these are not)

keras_nlp/src/models/video_swin/video_swin_layers.py

+                      **kwargs,
+                  ):
+                      super().__init__(**kwargs)
+                      # variables

Member

mattdangerw Sep 4, 2024

same here, these aren'te variables

keras_nlp/src/models/video_swin/video_swin_layers.py

+                      super().__init__(**kwargs)
+                      self.output_dim = output_dim
+                      self.hidden_dim = hidden_dim
+                      self._activation_identifier = activation

Member

mattdangerw Sep 4, 2024

I see this in a few places, we should not do it.

in init
self.activation = keras.activations.get(activation)

in get_config
"activation": keras.activations.serialize(self.activation),

keras_nlp/src/models/video_swin/video_swin_layers_test.py

+                          patch_size=(4, 4, 4), embed_dim=96
+                      )
+                      config = patch_embedding_model.get_config()
+                      assert isinstance(config, dict)

Member

mattdangerw Sep 4, 2024

rewrite all of these tests to match the repo style. We do not use assert anywhere.

mattdangerw force-pushed the keras-hub branch 3 times, most recently from 753047d to a5e5d8f Compare

September 13, 2024 20:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet