Releases: pipecat-ai/pipecat
v0.0.39
Fixed
- Fixed a regression introduced in 0.0.38 that would cause Daily transcription to stop the Pipeline.
v0.0.38
Added
-
Added
force_reload
,skip_validation
andtrust_repo
toSileroVAD
andSileroVADAnalyzer
. This allows caching and various GitHub repo validations. -
Added
send_initial_empty_metrics
flag toPipelineParams
to request for initial empty metrics (zero values). True by default.
Fixed
-
Fixed initial metrics format. It was using the wrong keys name/time instead of processor/value.
-
STT services should be using ISO 8601 time format for transcription frames.
-
Fixed an issue that would cause Daily transport to show a stop transcription error when actually none occurred.
v0.0.37
Added
-
Added
RTVIProcessor
which implements the RTVI-AI standard.
See https://github.com/rtvi-ai -
Added
BotInterruptionFrame
which allows interrupting the bot while talking. -
Added
LLMMessagesAppendFrame
which allows appending messages to the current LLM context. -
Added
LLMMessagesUpdateFrame
which allows changing the LLM context for the one provided in this new frame. -
Added
LLMModelUpdateFrame
which allows updating the LLM model. -
Added
TTSSpeakFrame
which causes the bot say some text. This text will not be part of the LLM context. -
Added
TTSVoiceUpdateFrame
which allows updating the TTS voice.
Removed
- We remove the
LLMResponseStartFrame
andLLMResponseEndFrame
frames. These were added in the past to properly handle interruptions for theLLMAssistantContextAggregator
. But theLLMContextAggregator
is now based onLLMResponseAggregator
which handles interruptions properly by just processing theStartInterruptionFrame
, so there's no need for these extra frames any more.
Fixed
-
Fixed an issue with
StatelessTextTransformer
where it was pushing a string instead of aTextFrame
. -
TTSService
end of sentence detection has been improved. It now works with acronyms, numbers, hours and others. -
Fixed an issue in
TTSService
that would not properly flush the current aggregated sentence if anLLMFullResponseEndFrame
was found.
Performance
CartesiaTTSService
now uses websockets which improves speed. It also leverages the new Cartesia contexts which maintains generated audio prosody when multiple inputs are sent, therefore improving audio quality a lot.
v0.0.36
Added
-
Added
GladiaSTTService
. https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition -
Added
XTTSService
. This is a local Text-To-Speech service. https://github.com/coqui-ai/TTS -
Added
UserIdleProcessor
. This processor can be used to wait for any interaction with the user. If the user doesn't say anything within a given timeout a provided callback is called. -
Added
IdleFrameProcessor
. This processor can be used to wait for frames within a given timeout. If no frame is received within the timeout a provided callback is called. -
Added new frame
BotSpeakingFrame
. This frame will be continuously pushed upstream while the bot is talking. -
It is now possible to specify a Silero VAD version when using
SileroVADAnalyzer
orSileroVAD
. -
Added
AysncFrameProcessor
andAsyncAIService
. Some services likeDeepgramSTTService
need to process things asynchronously. For example, audio is sent to Deepgram but transcriptions are not returned immediately. In these cases we still require all frames (except system frames) to be pushed downstream from a single task. That's whatAsyncFrameProcessor
is for. It creates a task and all frames should be pushed from that task. So, whenever a new Deepgram transcription is ready that transcription will also be pushed from this internal task. -
The
MetricsFrame
now includes processing metrics if metrics are enabled. The processing metrics indicate the time a processor needs to generate all its output. Note that not all processors generate these kind of metrics.
Changed
-
WhisperSTTService
model can now also be a string. -
Added missing * keyword separators in services.
Fixed
-
WebsocketServerTransport
doesn't try to send frames anymore if serializers returnsNone
. -
Fixed an issue where exceptions that occurred inside frame processors were being swallowed and not displayed.
-
Fixed an issue in
FastAPIWebsocketTransport
where it would still try to send data to the websocket after being closed.
Other
-
Added Fly.io deployment example in
examples/deployment/flyio-example
. -
Added new
17-detect-user-idle.py
example that shows how to use the newUserIdleProcessor
.
v0.0.35
Changed
-
FastAPIWebsocketParams
now require a serializer. -
TwilioFrameSerializer
now requires astreamSid
.
Fixed
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for 8000 sample rate.
v0.0.34
Fixed
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could interruptions to ignore transcriptions.
-
Fixed an issue introduced in 0.0.33 that would cause the LLM to generate shorter output.
v0.0.33
Changed
- Upgraded to Cartesia's new Python library 1.0.0.
CartesiaTTSService
now expects a voice ID instead of a voice name (you can get the voice ID from Cartesia's playground). You can also specify the audiosample_rate
andencoding
instead of the previousoutput_format
.
Fixed
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could cause static audio issues and interruptions to not work properly when dealing with multiple LLMs sentences.
-
Fixed an issue that could mix new LLM responses with previous ones when handling interruptions.
-
Fixed a Daily transport blocking situation that occurred while reading audio frames after a participant left the room. Needs daily-python >= 0.10.1.
v0.0.32
Added
-
Allow specifying a
DeepgramSTTService
url which allows using on-prem Deepgram. -
Added new
FastAPIWebsocketTransport
. This is a new websocket transport that can be integrated with FastAPI websockets. -
Added new
TwilioFrameSerializer
. This is a new serializer that knows how to serialize and deserialize audio frames from Twilio. -
Added Daily transport event:
on_dialout_answered
. See https://reference-python.daily.co/api_reference.html#daily.EventHandler -
Added new
AzureSTTService
. This allows you to use Azure Speech-To-Text.
Performance
- Convert
BaseOutputTransport
andBaseOutputTransport
to fully use asyncio and remove the use of threads.
Other
-
Added
twilio-chatbot
. This is an example that shows how to integrate Twilio phone numbers with a Pipecat bot. -
Updated
07f-interruptible-azure.py
to useAzureLLMService
,AzureSTTService
andAzureTTSService
.
v0.0.31
Performance
- Break long audio frames into 20ms chunks instead of 10ms.
v0.0.30
Added
-
Added
report_only_initial_ttfb
toPipelineParams
. This will make it so only the initial TTFB metrics after the user stops talking are reported. -
Added
OpenPipeLLMService
. This service will let you run OpenAI through OpenPipe's SDK. -
Allow specifying frame processors' name through a new
name
constructor argument. -
Added
DeepgramSTTService
. This service has an ongoing websocket connection. To handle this, it subclassesAIService
instead ofSTTService
. The output of this service will be pushed from the same task, except system frames likeStartFrame
,CancelFrame
orStartInterruptionFrame
.
Changed
-
FrameSerializer.deserialize()
can now returnNone
in case it is not possible to desearialize the given data. -
daily_rest.DailyRoomProperties
now allows extra unknown parameters.
Fixed
-
Fixed an issue where
DailyRoomProperties.exp
always had the same old timestamp unless set by the user. -
Fixed a couple of issues with
WebsocketServerTransport
. It needed to usepush_audio_frame()
and also VAD was not working properly. -
Fixed an issue that would cause LLM aggregator to fail with small
VADParams.stop_secs
values. -
Fixed an issue where
BaseOutputTransport
would send longer audio frames preventing interruptions.
Other
-
Added new
07h-interruptible-openpipe.py
example. This example shows how to use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe. -
Added new
dialin-chatbot
example. This examples shows how to call the bot using a phone number.