Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between Training and Inference in LLAVA-OneVision: Input Embedding Truncation #316

Open
xieck13 opened this issue Oct 21, 2024 · 0 comments

Comments

@xieck13
Copy link

xieck13 commented Oct 21, 2024

Hello,

I have noticed a potential inconsistency in the LLAVA-OV implementation regarding input embedding truncation.

During training, the code truncates input_embed based on the tokenizer's maximum length https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/llava/model/llava_arch.py#L499. However, in the inference code for sglang, input_embed is not truncated https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llava.py#L378. Instead, sglang only check context length in tokenizer manager. https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/tokenizer_manager.py#L226

Could this lead to discrepancies between training and inference?

Thank you for your attention to this matter.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant