New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Inconsistency between Training and Inference in LLAVA-OneVision: Input Embedding Truncation #316

Open

xieck13 opened this issue Oct 21, 2024 · 0 comments

xieck13 commented Oct 21, 2024 •

edited

Loading

Hello,

I have noticed a potential inconsistency in the LLAVA-OV implementation regarding input embedding truncation.

During training, the code truncates input_embed based on the tokenizer's maximum length https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/llava/model/llava_arch.py#L499. However, in the inference code for sglang, input_embed is not truncated https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llava.py#L378. Instead, sglang only check context length in tokenizer manager. https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/tokenizer_manager.py#L226

Could this lead to discrepancies between training and inference?

Thank you for your attention to this matter.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment