-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE, wip audio evals #5616
base: main
Are you sure you want to change the base?
Conversation
if system_instruction: | ||
messages.insert(0, {"role": "system", "content": str(system_instruction)}) | ||
return messages | ||
|
||
def verbose_generation_info(self) -> str: | ||
return f"OpenAI invocation parameters: {self.public_invocation_params}" | ||
|
||
async def _async_generate(self, prompt: Union[str, MultimodalPrompt], **kwargs: Any) -> str: | ||
async def _async_generate(self, prompt: Union[str, MultimodalPrompt], data_fetcher: Optional[Callable[[str], Audio]] = None, **kwargs: Any) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably not a good idea not to change the interface for _async_generate
as it's used widely across Phoenix—a kwarg shouldn't drastically change things though, so let me think about this.
@@ -279,27 +280,43 @@ def _init_rate_limiter(self) -> None: | |||
) | |||
|
|||
def _build_messages( | |||
self, prompt: MultimodalPrompt, system_instruction: Optional[str] = None | |||
self, prompt: MultimodalPrompt, data_fetcher: Optional[Callable[[str], Audio]] = None, system_instruction: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the type annotation on the callable does not match the docstring, it's best if users return a familiar type instead of an internal one I think
if part.content_type == PromptPartContentType.TEXT: | ||
messages.append({"role": "system", "content": part.content}) | ||
elif part.content_type == PromptPartContentType.AUDIO: | ||
audio_object = data_fetcher(part.content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't super like the fact that we've plumbed an arbitrary callable down to this level, maybe I'm overthinking it
as the total runtime of each classification (in seconds). | ||
""" | ||
if not isinstance(dataframe, pd.DataFrame): | ||
dataframe = pd.DataFrame(dataframe, columns=["audio_url"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be hard-coding the column name like this maybe? If users are passing in an iterable of urls, we should map the column names to the template variable names
model: BaseModel, | ||
template: Union[ClassificationTemplate, PromptTemplate, str], | ||
rails: List[str], | ||
data_fetcher: Optional[Callable[[str], Audio]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think we should be requiring the user return our internal type, it feels like a structure they need to learn / import from that feels clunky
"content": [ | ||
{ | ||
"type": "input_audio", | ||
"input_audio": {"data": part.content, "format": audio_format.value}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should be trying to infer the format from the file headers instead—we'll just do our best effort and fall back to something sensible if the inference doesn't work. We have the bytestring so the headers should be there
|
||
|
||
@dataclass | ||
class Audio: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels pretty hard for a user to really know they have to wrap the output in this wrapper class
This PR aims to provide the
audio_classify
function. The use case is similar tollm_classify
, but geared towards generating classifications on audio data.Big things left to be taken care of:
audio_classify
is a wrapper aroundllm_classify
which is already heavily tested)Thinking we have an optional param for
audio_format
which can either be astr
,Series
, orlist
(if they have different audio formats in the same dataset). Otherwise, if they just have anaudio_url
to AWS or GCloud, the file format info is typically present within the metadata.