Strange Behaviors in Executors #858

sangyuxiaowu · 2024-07-19T02:37:20Z

Description

While developing LLamaWorker, I used InteractiveExecutor and ChatHistory. During testing with qwen2, I noticed that the responses often ended with a strange character Ċ.

You can view the related code here:
LLamaWorker v1.0.38 - LLmModelService.cs

Later, to facilitate the addition of function_call and to avoid the strict constraints of ChatHistory, I tried using InteractiveExecutor directly. However, during testing, I observed that the stop words handling seemed to have issues:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "你好！有什么我可以帮助你的吗？Ċ<|im_start|>"
      },
      "index": 0,
      "finish_reason": null
    }
  ],
  "id": "chatcmpl-4e3bd819f6ad46a49442c5a7e41571db",
  "created": 1721355611,
  "model": "gpt",
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 10,
    "total_tokens": 11
  }
}

It includes an unexpected stop marker <|im_start|> in addition to the strange character Ċ.

Subsequently, I tried using StatelessExecutor, which seemed to provide perfect results. However, for each inference request, the following log entries were printed twice:

llama_new_context_with_model: n_ctx      = 32768
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  1792.00 MiB
llama_new_context_with_model: KV self size  = 1792.00 MiB, K (f16):  896.00 MiB, V (f16):  896.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.58 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   304.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    71.01 MiB
llama_new_context_with_model: graph nodes  = 875
llama_new_context_with_model: graph splits = 2

Upon examining the source code, I found that StatelessExecutor creates and immediately disposes of a Context:

public StatelessExecutor(LLamaWeights weights, IContextParams @params, ILogger? logger = null)
{
    Images = new List<byte[]>();
    _weights = weights;
    _params = @params;
    _logger = logger;
    _batch = new LLamaBatch();

    Context = _weights.CreateContext(_params, logger);
    Context.Dispose();
}

This seems unnecessary. Additionally, after using StatelessExecutor, its underlying Context cannot be used for token counting because it has been disposed of. Would it be beneficial for ILLamaExecutor to have a property like PromptTokens to reflect the number of input tokens?

You can view the related code here:
LLamaWorker - MyStatelessExecutor.cs

Therefore

The handling of stop words seems inconsistent across different executors.
StatelessExecutor appears to have some redundant code, potentially reducing processing efficiency.

The text was updated successfully, but these errors were encountered:

sangyuxiaowu changed the title ~~executors 一些奇怪的地方~~ Strange Behaviors in Executors Jul 22, 2024

sangyuxiaowu mentioned this issue Jul 24, 2024

Always getting a "Ċ" before EOS when using Qwen #865

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange Behaviors in Executors #858

Strange Behaviors in Executors #858

sangyuxiaowu commented Jul 19, 2024 •

edited

Loading

Strange Behaviors in Executors #858

Strange Behaviors in Executors #858

Comments

sangyuxiaowu commented Jul 19, 2024 • edited Loading

Description

Therefore

sangyuxiaowu commented Jul 19, 2024 •

edited

Loading