Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected response with long-context model (Phi-3) #651

Open
2 of 4 tasks
prd-tuong-nguyen opened this issue Oct 17, 2024 · 0 comments
Open
2 of 4 tasks

Unexpected response with long-context model (Phi-3) #651

prd-tuong-nguyen opened this issue Oct 17, 2024 · 0 comments

Comments

@prd-tuong-nguyen
Copy link

System Info

ghcr.io/predibase/lorax:f1ef0ee

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  --port 80 \
  --model-id microsoft/Phi-3-mini-128k-instruct \
  --cuda-memory-fraction 0.8 \
  --sharded false \
  --max-waiting-tokens 20 \
  --max-input-length 4096 \
  --max-total-tokens 8192 \
  --hostname 0.0.0.0 \
  --max-concurrent-requests 512 \
  --max-best-of 1  \
  --max-batch-prefill-tokens $BATCH_TOKEN \
  --max-active-adapters 10 \
  --adapter-source local \
  --adapter-cycle-time-s 2 \
  --json-output \
  --disable-custom-kernels \
  --dtype float16```

### Expected behavior

When running LoraX with the model microsoft/Phi-3-mini-128k-instruct, I encountered unexpected behavior with the following configurations:

Configuration A:
- max-input-length = 4096
- max-total-tokens = 8192
- Prompt Length: Approximately 1000 tokens
In this configuration, the generated response differs significantly from what is produced by VLLM.

Configuration B:
- max-input-length = 4090
- max-total-tokens = 4096
This configuration works well and produces expected results.

Additionally, I tested the model microsoft/Phi-3-mini-4k-instruct, and it also functioned correctly.

It seems there may be an issue with handling long contexts when using microsoft/Phi-3-mini-128k-instruct.

Could you please investigate this issue? I found a related discussion here: [Hugging Face Discussion](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/discussions/85). Thank you!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant