Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPLEMENTATION] magpie #740

Closed
gabrielmbmb opened this issue Jun 18, 2024 · 2 comments · Fixed by #778
Closed

[IMPLEMENTATION] magpie #740

gabrielmbmb opened this issue Jun 18, 2024 · 2 comments · Fixed by #778
Labels
enhancement New feature or request
Milestone

Comments

@gabrielmbmb
Copy link
Member

No description provided.

@gabrielmbmb gabrielmbmb added the enhancement New feature or request label Jun 21, 2024
@gabrielmbmb gabrielmbmb added this to the 1.3.0 milestone Jun 21, 2024
@fpreiss
Copy link
Contributor

fpreiss commented Jun 24, 2024

I have tried to implement the prompting strategy of the magpie paper using distilabel's ollama integration and noticed, that the current implementation does not allow me to overwrite the chat template. I believe the /api/generate endpoint would need to be wrapped instead of the /api/chat endpoint. I had some success with the following:

TEMPLATE_OVERRIDES: dict[str, str] = {
    # https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/#special-tokens-used-with-meta-llama-3
    LLAMA3_8B: "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
}


class OllamaMagpieLLM(OllamaLLM):
    """Magpie compatibility layer for Ollama."""

    async def agenerate(
        self,
        input: StandardInput,
        format: Literal["", "json"] = "",
        # TODO: include relevant options from `Options` in `agenerate` method.
        options: Options | None = None,
        keep_alive: bool | None = None,
    ) -> GenerateOutput:
        """Override of the `OllamaLLM.agenerate` method make Ollama fill the user message.

        The original implementation uses Ollama's chat endpoint instead of the generate endpoint.
        This simplifies implementing multi-turn conversations, but we can't manipulate the prompt template.
        """
        try:
            prompt = input[0]["content"], # needs some work for multi turn support
            completion: dict[str, Any] = await self._aclient.generate(
                prompt=prompt
                model=self.model,
                template=TEMPLATE_OVERRIDES[self.model],
                stream=False,
                format=format,
                options=options,
                keep_alive=keep_alive,
            )
            return [completion["response"]]
        except Exception as e:
            self._logger.warning(
                f"⚠️ Received no response using Ollama client (model: '{self.model_name}')."
                f" Finish reason was: {e}"
            )

Note that as of writing this, the prompt in the generate call has to be a non-empty string, to generate the user instructions as outlined in the paper. Seems to be an issue on ollama/llama.cpp's side.

@gabrielmbmb gabrielmbmb linked a pull request Jul 11, 2024 that will close this issue
@gabrielmbmb
Copy link
Member Author

Hi @fpreiss, for now we have implemented Magpie for TransformersLLM, InferenceEndpointsLLM and vLLM. Will work in the next release to add compatibility to the rest of LLMs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants