Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE, wip audio evals #5616

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft

DO NOT MERGE, wip audio evals #5616

wants to merge 16 commits into from

Conversation

21ShisodeParth
Copy link
Contributor

@21ShisodeParth 21ShisodeParth commented Dec 5, 2024

This PR aims to provide the audio_classify function. The use case is similar to llm_classify, but geared towards generating classifications on audio data.

Big things left to be taken care of:

  1. Testing - not extensive, since audio_classify is a wrapper around llm_classify which is already heavily tested)
  2. What do we do if the user wants to pass in audio bytes or local paths to their audio files?
    Thinking we have an optional param for audio_format which can either be a str, Series, or list(if they have different audio formats in the same dataset). Otherwise, if they just have an audio_url to AWS or GCloud, the file format info is typically present within the metadata.

if system_instruction:
messages.insert(0, {"role": "system", "content": str(system_instruction)})
return messages

def verbose_generation_info(self) -> str:
return f"OpenAI invocation parameters: {self.public_invocation_params}"

async def _async_generate(self, prompt: Union[str, MultimodalPrompt], **kwargs: Any) -> str:
async def _async_generate(self, prompt: Union[str, MultimodalPrompt], data_fetcher: Optional[Callable[[str], Audio]] = None, **kwargs: Any) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably not a good idea not to change the interface for _async_generate as it's used widely across Phoenix—a kwarg shouldn't drastically change things though, so let me think about this.

@@ -279,27 +280,43 @@ def _init_rate_limiter(self) -> None:
)

def _build_messages(
self, prompt: MultimodalPrompt, system_instruction: Optional[str] = None
self, prompt: MultimodalPrompt, data_fetcher: Optional[Callable[[str], Audio]] = None, system_instruction: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the type annotation on the callable does not match the docstring, it's best if users return a familiar type instead of an internal one I think

if part.content_type == PromptPartContentType.TEXT:
messages.append({"role": "system", "content": part.content})
elif part.content_type == PromptPartContentType.AUDIO:
audio_object = data_fetcher(part.content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't super like the fact that we've plumbed an arbitrary callable down to this level, maybe I'm overthinking it

as the total runtime of each classification (in seconds).
"""
if not isinstance(dataframe, pd.DataFrame):
dataframe = pd.DataFrame(dataframe, columns=["audio_url"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be hard-coding the column name like this maybe? If users are passing in an iterable of urls, we should map the column names to the template variable names

model: BaseModel,
template: Union[ClassificationTemplate, PromptTemplate, str],
rails: List[str],
data_fetcher: Optional[Callable[[str], Audio]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think we should be requiring the user return our internal type, it feels like a structure they need to learn / import from that feels clunky

"content": [
{
"type": "input_audio",
"input_audio": {"data": part.content, "format": audio_format.value},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should be trying to infer the format from the file headers instead—we'll just do our best effort and fall back to something sensible if the inference doesn't work. We have the bytestring so the headers should be there



@dataclass
class Audio:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels pretty hard for a user to really know they have to wrap the output in this wrapper class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📘 Todo
Development

Successfully merging this pull request may close these issues.

2 participants