Enable HF PretrainedModel loading for speculative model training #122

JRosenkranz · 2024-10-18T18:44:08Z

This PR enables HF PretrainedModel loading. To use this feature, simply set the architecture to "hf_pretrained", and the variant to a huggingface variant (model_id). This was enabled my removing the need to create special adapters, by wrapping a model in a HiddenStatesExtractor (extracts hidden states from base model). With this new wrapper, the adapters and overridden model classes that used include_embeds were not required, as well as generate could be used from fms main

…te from fms directly to extract hidden states; removed unnecessary classes

daviswer · 2024-10-18T19:27:36Z

Nice, this does make more sense once models are partitioned into headless/head components!

sahilsuneja1 · 2024-10-21T15:10:23Z

Couldn't follow the reset logic. Rest everything looks good!

added HiddenStatesExtractor which can be used with forward and genera…

642635f

…te from fms directly to extract hidden states; removed unnecessary classes

JRosenkranz requested review from sahilsuneja1 and daviswer October 18, 2024 18:44

JRosenkranz self-assigned this Oct 18, 2024

removed generate

5ee9640

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable HF PretrainedModel loading for speculative model training #122

Enable HF PretrainedModel loading for speculative model training #122

JRosenkranz commented Oct 18, 2024

daviswer commented Oct 18, 2024

sahilsuneja1 commented Oct 21, 2024 •

edited

Loading

Enable HF PretrainedModel loading for speculative model training #122

Are you sure you want to change the base?

Enable HF PretrainedModel loading for speculative model training #122

Conversation

JRosenkranz commented Oct 18, 2024

daviswer commented Oct 18, 2024

sahilsuneja1 commented Oct 21, 2024 • edited Loading

sahilsuneja1 commented Oct 21, 2024 •

edited

Loading