Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange Behaviors in Executors #858

Open
sangyuxiaowu opened this issue Jul 19, 2024 · 0 comments
Open

Strange Behaviors in Executors #858

sangyuxiaowu opened this issue Jul 19, 2024 · 0 comments

Comments

@sangyuxiaowu
Copy link
Contributor

sangyuxiaowu commented Jul 19, 2024

Description

While developing LLamaWorker, I used InteractiveExecutor and ChatHistory. During testing with qwen2, I noticed that the responses often ended with a strange character Ċ.

You can view the related code here:
LLamaWorker v1.0.38 - LLmModelService.cs

Later, to facilitate the addition of function_call and to avoid the strict constraints of ChatHistory, I tried using InteractiveExecutor directly. However, during testing, I observed that the stop words handling seemed to have issues:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "你好!有什么我可以帮助你的吗?Ċ<|im_start|>"
      },
      "index": 0,
      "finish_reason": null
    }
  ],
  "id": "chatcmpl-4e3bd819f6ad46a49442c5a7e41571db",
  "created": 1721355611,
  "model": "gpt",
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 10,
    "total_tokens": 11
  }
}

It includes an unexpected stop marker <|im_start|> in addition to the strange character Ċ.

Subsequently, I tried using StatelessExecutor, which seemed to provide perfect results. However, for each inference request, the following log entries were printed twice:

llama_new_context_with_model: n_ctx      = 32768
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  1792.00 MiB
llama_new_context_with_model: KV self size  = 1792.00 MiB, K (f16):  896.00 MiB, V (f16):  896.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.58 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   304.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    71.01 MiB
llama_new_context_with_model: graph nodes  = 875
llama_new_context_with_model: graph splits = 2

Upon examining the source code, I found that StatelessExecutor creates and immediately disposes of a Context:

public StatelessExecutor(LLamaWeights weights, IContextParams @params, ILogger? logger = null)
{
    Images = new List<byte[]>();
    _weights = weights;
    _params = @params;
    _logger = logger;
    _batch = new LLamaBatch();

    Context = _weights.CreateContext(_params, logger);
    Context.Dispose();
}

This seems unnecessary. Additionally, after using StatelessExecutor, its underlying Context cannot be used for token counting because it has been disposed of. Would it be beneficial for ILLamaExecutor to have a property like PromptTokens to reflect the number of input tokens?

You can view the related code here:
LLamaWorker - MyStatelessExecutor.cs

Therefore

  1. The handling of stop words seems inconsistent across different executors.
  2. StatelessExecutor appears to have some redundant code, potentially reducing processing efficiency.
@sangyuxiaowu sangyuxiaowu changed the title executors 一些奇怪的地方 Strange Behaviors in Executors Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant