Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError model[0] did not exist in tensor? #446

Open
FrozzDay opened this issue Oct 27, 2024 · 3 comments
Open

KeyError model[0] did not exist in tensor? #446

FrozzDay opened this issue Oct 27, 2024 · 3 comments

Comments

@FrozzDay
Copy link

I am performing a Mega Merge using LLaMA 3.2 3B, both the base model and fine-tuning/instruction tuning, with the DARE linear method. Following the successful completion of the initial merge, I encountered an error when attempting to merge the second one. The error message:

Traceback (most recent call last):
  File "/usr/local/bin/mergekit-mega", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mergekit/options.py", line 82, in wrapper
    f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/mergekit/scripts/megamerge.py", line 187, in main
    merge(m, merge_options, force, out_path)
  File "/usr/local/lib/python3.10/site-packages/mergekit/scripts/megamerge.py", line 81, in merge
    run_merge(
  File "/usr/local/lib/python3.10/site-packages/mergekit/merge.py", line 96, in run_merge
    for _task, value in exec.run(quiet=options.quiet):
  File "/usr/local/lib/python3.10/site-packages/mergekit/graph.py", line 197, in run
    res = task.execute(**arguments)
  File "/usr/local/lib/python3.10/site-packages/mergekit/tokenizer/embed.py", line 54, in execute
    embed_size = tensors[models[0]].shape[1]
  File "/usr/local/lib/python3.10/site-packages/mergekit/tokenizer/embed.py", line 54, in execute
    embed_size = tensors[models[0]].shape[1]
KeyError: ModelReference(model=ModelPath(path='unsloth/Llama-3.2-3B', revision=None), lora=None, override_architecture=None)

The config is something like this

models:
  - model: unsloth/Llama-3.2-3B
  - model: model-1
    parameters:
      weight: 1
  - model: model-2
    parameters:
      weight: 1
merge_method: dare_linear
base_model: unsloth/Llama-3.2-3B
tokenizer_source: model-1
parameters:
dtype: float32
@David-AU-github
Copy link

David-AU-github commented Nov 12, 2024

Confirming exact same error ; mergekit can not find the "base_model" ; including if the path is local (absolute) on windows.

Funny thing is some mergekits work fine - no issue, where as others fail for the reasons below.
And merges I did in late SEPT 2024, now SOME fail ; others are fine ?!?!

Example: L3 models -> merge fine, no issue
Gemmas: Now break as noted below... but not all of them (??!?!)

This works fine:

models:

  • model: G:/9B/gemma-2-9b-it-abliterated
    parameters:
    weight: .4
    merge_method: dare_ties
    base_model: G:/9B/gemma2-gutenberg-9B
    tokenizer_source: union
    dtype: bfloat16

BUT THIS DIES:

models:

  • model: G:/9B/Gemma-2-Ataraxy-9B
    parameters:
    weight: [1,1,.75,.5,.25,.25,.05,.01]
  • model: G:/9B/Gemma-2-9B-It-SPPO-Iter3
    parameters:
    weight: [1,1,.75,.5,.25,.25,.05,.01]
  • model: G:/9B/gemma-2-Ifable-9B
    parameters:
    weight: [1,1,.75,.5,.25,.25,.05,.01]
    merge_method: dare_ties
    base_model: E:/Gemma-Dark-Writer3-mega-ab
    dtype: bfloat16

But exact SAME as above (3 models, base, dare_ties) , for Llama 3/3.1 merge - works fine (??)

Other GEMMA merges of the same type (3 models, base, dare_ties) that DID work (sept 2024) now crash and burn.

Even if I change this:
"base_model: E:/Gemma-Dark-Writer3-mega-ab"

Still dies, no matter what.
If I put in a bad location , it gives the normal not found too ; (??)

Likewise any "Gemma" merges like the one above that DID WORK fine, now crash and burn.
(specifically: dare_ties, 3 models + base model)

Please advise.

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Program Files\Python312\Scripts\mergekit-yaml.exe_main
.py", line 7, in
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\mergekit3\mergekit\mergekit\options.py", line 82, in wrapper
f(*args, **kwargs)
File "F:\mergekit3\mergekit\mergekit\scripts\run_yaml.py", line 47, in main
run_merge(
File "F:\mergekit3\mergekit\mergekit\merge.py", line 96, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "F:\mergekit3\mergekit\mergekit\graph.py", line 197, in run
res = task.execute(**arguments)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\mergekit3\mergekit\mergekit\merge_methods\generalized_task_arithmetic.py", line 126, in execute
tvs, base = get_task_vectors(
^^^^^^^^^^^^^^^^^
File "F:\mergekit3\mergekit\mergekit\merge_methods\generalized_task_arithmetic.py", line 201, in get_task_vectors
base = tensors[base_model]
~~~~~~~^^^^^^^^^^^^
KeyError: ModelReference(model=ModelPath(path='G:/9B/gemma2-gutenberg-9B', revision=None), lora=None, override_architecture=None)

@cg123
Copy link
Collaborator

cg123 commented Nov 29, 2024

@FrozzDay @David-AU-github
I'm betting in all the cases where this is happening your base model has tied weights, but one or more of the fine tuned versions have a separate lm_head (whether from training them that way or having been produced by an older version of mergekit that always output them.)

If you're able, could you try this merge on a commit from before #429 (if it's Llama) or #406 (if it's Gemma)? I'm working on more robust handling for cases like this but it'd be great to get confirmation that the issue you're experiencing is what I have in mind. Thanks!

@David-AU-github
Copy link

@cg123 Thank you so much.;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants