Probable FLAVA multimodal encoder bug #530

rishabhm12 · 2024-10-14T11:28:42Z

🚀 The feature, motivation and pitch

In flava multimodal encoder, why don't we pass an attention mask to mask out '[PAD]' embeddings coming from text encoder? Is this a bug or intentional?

multimodal/torchmultimodal/models/flava/model.py

Line 197 in e4d288b

multimodal_outputs = self.encode_mm(

Alternatives

No response

Additional context

No response

rishabhm12 · 2024-10-15T04:47:11Z

@ebsmothers

rishabhm12 changed the title ~~FLAVA multimodal encoder bug~~ Probable FLAVA multimodal encoder bug Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probable FLAVA multimodal encoder bug #530

Probable FLAVA multimodal encoder bug #530

rishabhm12 commented Oct 14, 2024

rishabhm12 commented Oct 15, 2024

Probable FLAVA multimodal encoder bug #530

Probable FLAVA multimodal encoder bug #530

Comments

rishabhm12 commented Oct 14, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

rishabhm12 commented Oct 15, 2024