You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When looking at the image of the architecture overview, I noticed two things that were reflected differently in the code.
The noise that is added to is not present in the code. Am I missing something here?
The cubes that represent the shape of are misleading because they imply that when merging and the channel dimension changes. However, the linear layer here halves the number of channels of the combined feature maps. This way the number of channels is .
If I am not mistaken or have missed something, would it be possible to fix those issues?
Because besides those minor flaws, the graphic is really beautiful and provides a great overview of the network's architecure.
The text was updated successfully, but these errors were encountered:
Yes you are right, in the code we didn't introduce the noise explicitly. Since Xi is a subset of images, shuffling Xi is a way to introduce noise implicitly, which is our original intuition. I agree with you that this noise injection arrow in the Figure might mislead people.
Yes, F is the concatenation of hat{Fs} and Fc along channels so as to end up with channel number 1024. In the Figure, the missing part is the Linear layer that projects 1024 channels back to 512 between F and G.
We will try to update the Figure in the next version, cheers:-)
When looking at the image of the architecture overview, I noticed two things that were reflected differently in the code.
If I am not mistaken or have missed something, would it be possible to fix those issues?
Because besides those minor flaws, the graphic is really beautiful and provides a great overview of the network's architecure.
The text was updated successfully, but these errors were encountered: