Edge Case: Occasional response ending prematurely while not reaching max tokens #2169

jorgecolonconsulting · 2024-10-27T20:48:45Z

Issue

Had a weird bug where the architect command was ending way before I reached max tokens set in .aider.model.metadata.json and explicitly supported by the provider. I would get the following error:

Model together_ai/Qwen/Qwen2.5-72B-Instruct-Turbo has hit a token limit!
Token counts below are approximate.

Input tokens: ~4,688 of 32,768
Output tokens: ~2,022 of 32,768
Total tokens: ~6,710 of 32,768

I modified aider to output debug info from litellm and realized that response was returning a finish_reason: 'length'. Although not documented in the Together AI docs, looks like the default max_tokens value is 2048 if it isn't provided in the response. If I change the max_tokens to 32768 I get a different error. Something like inputs tokens + max_new_tokens must be <= 32769. If I set max_tokens: 20000 as extra_params in .aider.model.settings.yml everything works just fine and I get past 2048 tokens without a problem. I believe I've had this problem in the past with another model, I just don't remember which model I used.

I found that base_coder has the following:

  except FinishReasonLength:
      # We hit the output limit!
      if not self.main_model.info.get("supports_assistant_prefill"):
          exhausted = True
          break
# ...
if exhausted:
    self.show_exhausted_error()
    self.num_exhausted_context_windows += 1
    return

# ...
def show_exhausted_error(self):
# ...
    res.append(f"Model {self.main_model.name} has hit a token limit!")
    res.append("Token counts below are approximate.")

Setting a fixed max tokens works for this specific case, but it seems like a more dynamic approach would be ideal where max_tokens reduces based on the amount of tokenized input. Not exactly sure if that's the best approach though since it's more often that models have a large context and fixed max completion tokens 4096 or 8192. Perhaps a use_max_dynamic_tokens setting in .aider.model.settings.yml that requires context_size?

I'm willing to do the work for this, just need direction and what to look at since I'm not familiar with the codebase for aider and litellm.

Version and model info

Aider v0.60.1
Model: together_ai/Qwen/Qwen2.5-72B-Instruct-Turbo
.aider.model.settings.yml:

- name: "together_ai/Qwen/Qwen2.5-72B-Instruct-Turbo"
  edit_format: "diff"
  use_repo_map: true
  examples_as_sys_msg: true
  reminder: "sys"
  streaming: false

  # extra_params:
    #   max_tokens: 20000 # workaround to get around the issue

.aider.model.metadata.json:

"together_ai/Qwen/Qwen2.5-72B-Instruct-Turbo": {
    "max_tokens": 32768,
    "max_input_tokens": 32768,
    "max_output_tokens": 32768,
    "input_cost_per_token": 0.0000012,
    "output_cost_per_token": 0.0000012,
    "litellm_provider": "together_ai",
    "mode": "chat",
    "supports_function_calling": false,
    "supports_tool_choice": false
}

Rough idea of tokens:

$ 0.0022    1,792 system messages
$ 0.0008      698 repository map                      use --map-tokens to resize
$ 0.0040    3,337 beagledocs/scrape.py                /drop to remove
$ 0.0004      373 .ai/codex/codex.md (read-only)      /drop to remove
$ 0.0001       52 .ai/rules/bug-review.md (read-only) /drop to remove
==================
$ 0.0075    6,252 tokens total
           26,516 tokens remaining in context window
           32,768 tokens max context window size

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edge Case: Occasional response ending prematurely while not reaching max tokens #2169

Edge Case: Occasional response ending prematurely while not reaching max tokens #2169

jorgecolonconsulting commented Oct 27, 2024

Edge Case: Occasional response ending prematurely while not reaching max tokens #2169

Edge Case: Occasional response ending prematurely while not reaching max tokens #2169

Comments

jorgecolonconsulting commented Oct 27, 2024

Issue

Version and model info