Releases: pipecat-ai/pipecat
v0.0.49
Added
-
Added RTVI
on_bot_started
event which is useful in a single turn interaction. -
Added
DailyTransport
eventsdialin-connected
,dialin-stopped
,dialin-error
anddialin-warning
. Needs daily-python >= 0.13.0. -
Added
RimeHttpTTSService
and the07q-interruptible-rime.py
foundational example. -
Added
STTMuteFilter
, a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during bot speech. The processor supports multiple strategies:FIRST_SPEECH
(mute only during bot's first speech),ALWAYS
(mute during all bot speech), orCUSTOM
(using provided callback). -
Added
STTMuteFrame
, a control frame that enables/disables speech transcription in STT services.
v0.0.48
Added
-
There's now an input queue in each frame processor. When you call
FrameProcessor.push_frame()
this will internally callFrameProcessor.queue_frame()
on the next processor (upstream or downstream) and the frame will be internally queued (except system frames). Then, the queued frames will get processed. With this input queue it is also possible for FrameProcessors to block processing more frames by callingFrameProcessor.pause_processing_frames()
. The way to resume processing frames is by callingFrameProcessor.resume_processing_frames()
. -
Added audio filter
NoisereduceFilter
. -
Introduce input transport audio filters (
BaseAudioFilter
). Audio filters can be used to remove background noises before audio is sent to VAD. -
Introduce output transport audio mixers (
BaseAudioMixer
). Output transport audio mixers can be used, for example, to add background sounds or any other audio mixing functionality before the output audio is actually written to the transport. -
Added
GatedOpenAILLMContextAggregator
. This aggregator keeps the last received OpenAI LLM context frame and it doesn't let it through until the notifier is notified. -
Added
WakeNotifierFilter
. This processor expects a list of frame types and will execute a given callback predicate when a frame of any of those type is being processed. If the callback returns true the notifier will be notified. -
Added
NullFilter
. A null filter doesn't push any frames upstream or downstream. This is usually used to disable one of the pipelines inParallelPipeline
. -
Added
EventNotifier
. This can be used as a very simple synchronization feature between processors. -
Added
TavusVideoService
. This is an integration for Tavus digital twins. (see https://www.tavus.io/) -
Added
DailyTransport.update_subscriptions()
. This allows you to have fine grained control of what media subscriptions you want for each participant in a room. -
Added audio filter
KrispFilter
.
Changed
-
The following
DailyTransport
functions are nowasync
which means they need to be awaited:start_dialout
,stop_dialout
,start_recording
,stop_recording
,capture_participant_transcription
andcapture_participant_video
. -
Changed default output sample rate to 24000. This changes all TTS service to output to 24000 and also the default output transport sample rate. This improves audio quality at the cost of some extra bandwidth.
-
AzureTTSService
now uses Azure websockets instead of HTTP requests. -
The previous
AzureTTSService
HTTP implementation is nowAzureHttpTTSService
.
Fixed
-
Websocket transports (FastAPI and Websocket) now synchronize with time before sending data. This allows for interruptions to just work out of the box.
-
Improved bot speaking detection for all TTS services by using actual bot audio.
-
Fixed an issue that was generating constant bot started/stopped speaking frames for HTTP TTS services.
-
Fixed an issue that was causing stuttering with AWS TTS service.
-
Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting very small time values.
-
Fixed an issue where AzureTTSService wasn't initializing the specified language.
Other
-
Add
23-bot-background-sound.py
foundational example. -
Added a new foundational example
22-natural-conversation.py
. This example shows how to achieve a more natural conversation detecting when the user ends statement.
v0.0.47
Added
-
Added
AssemblyAISTTService
and corresponding foundational examples07o-interruptible-assemblyai.py
and13d-assemblyai-transcription.py
. -
Added a foundational example for Gladia transcription:
13c-gladia-transcription.py
Changed
-
Updated
GladiaSTTService
to use the V2 API. -
Changed
DailyTransport
transcription model tonova-2-general
.
Fixed
-
Fixed an issue that would cause an import error when importing
SileroVADAnalyzer
from the old packagepipecat.vad.silero
. -
Fixed
enable_usage_metrics
to control LLM/TTS usage metrics separately fromenable_metrics
.
v0.0.46
Added
-
Added
audio_passthrough
parameter toSTTService
. If enabled it allows audio frames to be pushed downstream in case other processors need them. -
Added input parameter options for
PlayHTTTSService
andPlayHTHttpTTSService
.
Changed
-
Changed
DeepgramSTTService
model tonova-2-general
. -
Moved
SileroVAD
audio processor toprocessors.audio.vad
. -
Module
utils.audio
is nowaudio.utils
. A newresample_audio
function has been added. -
PlayHTTTSService
now uses PlayHT websockets instead of HTTP requests. -
The previous
PlayHTTTSService
HTTP implementation is nowPlayHTHttpTTSService
. -
PlayHTTTSService
andPlayHTHttpTTSService
now use avoice_engine
ofPlayHT3.0-mini
, which allows for multi-lingual support. -
Renamed
OpenAILLMServiceRealtimeBeta
toOpenAIRealtimeBetaLLMService
to match other services.
Deprecated
-
LLMUserResponseAggregator
andLLMAssistantResponseAggregator
are mostly deprecated, useOpenAILLMContext
instead. -
The
vad
package is now deprecated andaudio.vad
should be used instead. Theavd
package will get removed in a future release.
Fixed
-
Fixed an issue that would cause an error if no VAD analyzer was passed to
LiveKitTransport
params. -
Fixed
SileroVAD
processor to support interruptions properly.
Other
- Added
examples/foundational/07-interruptible-vad.py
. This is the same as07-interruptible.py
but using theSileroVAD
processor instead of passing theVADAnalyzer
in the transport.
v0.0.45
Changed
- Metrics messages have moved out from the transport's base output into RTVI.
v0.0.44
Added
-
Added support for OpenAI Realtime API with the new
OpenAILLMServiceRealtimeBeta
processor. (see https://platform.openai.com/docs/guides/realtime/overview) -
Added
RTVIBotTranscriptionProcessor
which will send the RTVIbot-transcription
protocol message. These are TTS text aggregated (into sentences) messages. -
Added new input params to the
MarkdownTextFilter
utility. You can setfilter_code
to filter code from text andfilter_tables
to filter tables from text. -
Added
CanonicalMetricsService
. This processor uses the newAudioBufferProcessor
to capture conversation audio and later send it to Canonical AI. (see https://canonical.chat/) -
Added
AudioBufferProcessor
. This processor can be used to buffer mixed user and bot audio. This can later be saved into an audio file or processed by some audio analyzer. -
Added
on_first_participant_joined
event toLiveKitTransport
.
Changed
-
LLM text responses are now logged properly as unicode characters.
-
UserStartedSpeakingFrame
,UserStoppedSpeakingFrame
,BotStartedSpeakingFrame
,BotStoppedSpeakingFrame
,BotSpeakingFrame
andUserImageRequestFrame
are now based fromSystemFrame
Fixed
-
Merge
RTVIBotLLMProcessor
/RTVIBotLLMTextProcessor
andRTVIBotTTSProcessor
/RTVIBotTTSTextProcessor
to avoid out of order issues. -
Fixed an issue in RTVI protocol that could cause a
bot-llm-stopped
orbot-tts-stopped
message to be sent before abot-llm-text
orbot-tts-text
message. -
Fixed
DeepgramSTTService
constructor settings not being merged with default ones. -
Fixed an issue in Daily transport that would cause tasks to be hanging if urgent transport messages were being sent from a transport event handler.
-
Fixed an issue in
BaseOutputTransport
that would causeEndFrame
to be pushed downed too early and callFrameProcessor.cleanup()
before letting the transport stop properly.
v0.0.43
Added
-
Added a new util called
MarkdownTextFilter
which is a subclass of a new base class calledBaseTextFilter
. This is a configurable utility which is intended to filter text received by TTS services. -
Added new
RTVIUserLLMTextProcessor
. This processor will send an RTVIuser-llm-text
message with the user content's that was sent to the LLM.
Changed
-
TransportMessageFrame
doesn't have anurgent
field anymore, instead there's now aTransportMessageUrgentFrame
which is aSystemFrame
and therefore skip all internal queuing. -
For TTS services, convert inputted languages to match each service's language format.
Fixed
- Fixed an issue where changing a language with the Deepgram STT service wouldn't apply the change. This was fixed by disconnecting and reconnecting when the language changes.
v0.0.42
Added
-
SentryMetrics
has been added to report frame processor metrics to Sentry. This is now possible becauseFrameProcessorMetrics
can now be passed toFrameProcessor
. -
Added Google TTS service and corresponding foundational example
07n-interruptible-google.py
-
Added AWS Polly TTS support and
07m-interruptible-aws.py
as an example. -
Added InputParams to Azure TTS service.
-
Added
LivekitTransport
(audio-only for now). -
RTVI 0.2.0 is now supported.
-
All
FrameProcessors
can now register event handlers.
tts = SomeTTSService(...)
@tts.event_handler("on_connected"):
async def on_connected(processor):
...
-
Added
AsyncGeneratorProcessor
. This processor can be used together with aFrameSerializer
as an async generator. It provides agenerator()
function that returns anAsyncGenerator
and that yields serialized frames. -
Added
EndTaskFrame
andCancelTaskFrame
. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively. -
Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions.
-
Added
sample_rate
as a constructor parameter for TTS services. -
Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames. To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task. In this version all the frame processors have their own task to push frames. That is, when
push_frame()
is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out. -
Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional
pts
field (prensentation timestamp). There's currently just one clock implementationSystemClock
and thepts
field is currently only used forTextFrame
s (audio and image frames will be next). -
A clock can now be specified to
PipelineTask
(defaults toSystemClock
). This clock will be passed to each frame processor via theStartFrame
. -
Added
CartesiaHttpTTSService
. -
DailyTransport
now supports setting the audio bitrate to improve audio quality through theDailyParams.audio_out_bitrate
parameter. The new default is 96kbps. -
DailyTransport
now uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed. -
Interruptions support has been added to
TwilioFrameSerializer
when usingFastAPIWebsocketTransport
. -
Added new
LmntTTSService
text-to-speech service. (see https://www.lmnt.com/) -
Added
TTSModelUpdateFrame
,TTSLanguageUpdateFrame
,STTModelUpdateFrame
, andSTTLanguageUpdateFrame
frames to allow you to switch models, language and voices in TTS and STT services. -
Added new
transcriptions.Language
enum.
Changed
-
Context frames are now pushed downstream from assistant context aggregators.
-
Removed Silero VAD torch dependency.
-
Updated individual update settings frame classes into a single
ServiceUpdateSettingsFrame
class. -
We now distinguish between input and output audio and image frames. We introduce
InputAudioRawFrame
,OutputAudioRawFrame
,InputImageRawFrame
andOutputImageRawFrame
(and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport. -
ParallelTask
has been renamed toSyncParallelPipeline
. ASyncParallelPipeline
is a frame processor that contains a list of different pipelines to be executed concurrently. The difference between aSyncParallelPipeline
and aParallelPipeline
is that, given an input frame, theSyncParallelPipeline
will wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response). -
StartFrame
is back a system frame to make sure it's processed immediately by all processors.EndFrame
stays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed. -
Updated
MoondreamService
revision to2024-08-26
. -
CartesiaTTSService
andElevenLabsTTSService
now add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time. -
DailyTransport.on_joined
event now returns the full session data instead of just the participant. -
CartesiaTTSService
is now a subclass ofTTSService
. -
DeepgramSTTService
is now a subclass ofSTTService
. -
WhisperSTTService
is now a subclass ofSegmentedSTTService
. ASegmentedSTTService
is aSTTService
where the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream.
Fixed
-
Fixed OpenAI multiple function calls.
-
Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases.
-
Fixed a
BaseOutputTransport
issue that would stop audio and video rendering tasks (after receiving andEndFrame
) before the internal queue was emptied, causing the pipeline to finish prematurely. -
StartFrame
should be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens onStartFrame
) and other frames come in resulting in undesired behavior.
Performance
obj_id()
andobj_count()
now useitertools.count
avoiding the need ofthreading.Lock
.
Other
- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).
v0.0.41
Added
- Added
LivekitFrameSerializer
audio frame serializer.
Fixed
-
Fix
FastAPIWebsocketOutputTransport
variable name clash with subclass. -
Fix an
AnthropicLLMService
issue with empty arguments in function calling.
Other
- Fixed
studypal
example errors.
v0.0.40
Added
-
VAD parameters can now be dynamicallt updated using the
VADParamsUpdateFrame
. -
ErrorFrame
has now afatal
field to indicate the bot should exit if a fatal error is pushed upstream (false by default). A newFatalErrorFrame
that sets this flag to true has been added. -
AnthropicLLMService
now supports function calling and initial support for prompt caching.
(see https://www.anthropic.com/news/prompt-caching) -
ElevenLabsTTSService
can now specify ElevenLabs input parameters such asoutput_format
. -
TwilioFrameSerializer
can now specify Twilio's and Pipecat's desired sample rates to use. -
Added new
on_participant_updated
event toDailyTransport
. -
Added
DailyRESTHelper.delete_room_by_name()
andDailyRESTHelper.delete_room_by_url()
. -
Added LLM and TTS usage metrics. Those are enabled when
PipelineParams.enable_usage_metrics
is True. -
AudioRawFrame
s are now pushed downstream from the base output transport. This allows capturing the exact words the bot says by adding an STT service at the end of the pipeline. -
Added new
GStreamerPipelineSource
. This processor can generate image or audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP stream or anything supported by GStreamer). -
Added
TransportParams.audio_out_is_live
. This flag is False by default and it is useful to indicate we should not synchronize audio with sporadic images. -
Added new
BotStartedSpeakingFrame
andBotStoppedSpeakingFrame
control frames. These frames are pushed upstream and they should wrapBotSpeakingFrame
. -
Transports now allow you to register event handlers without decorators.
Changed
-
Support RTVI message protocol 0.1. This includes new messages, support for messages responses, support for actions, configuration, webhooks and a bunch of new cool stuff.
(see https://docs.rtvi.ai/) -
SileroVAD
dependency is now imported via pip'ssilero-vad
package. -
ElevenLabsTTSService
now useseleven_turbo_v2_5
model by default. -
BotSpeakingFrame
is now a control frame. -
StartFrame
is now a control frame similar toEndFrame
. -
DeepgramTTSService
now is more customizable. You can adjust the encoding and sample rate.
Fixed
-
TTSStartFrame
andTTSStopFrame
are now sent when TTS really starts and stops. This allows for knowing when the bot starts and stops speaking even with asynchronous services (like Cartesia). -
Fixed
AzureSTTService
transcription frame timestamps. -
Fixed an issue with
DailyRESTHelper.create_room()
expirations which would cause this function to stop working after the initial expiration elapsed. -
Improved
EndFrame
andCancelFrame
handling.EndFrame
should end things gracefully while aCancelFrame
should cancel all running tasks as soon as possible. -
Fixed an issue in
AIService
that would cause a yieldedNone
value to be processed. -
RTVI's
bot-ready
message is now sent when the RTVI pipeline is ready and a first participant joins. -
Fixed a
BaseInputTransport
issue that was causing incoming system frames to be queued instead of being pushed immediately. -
Fixed a
BaseInputTransport
issue that was causing start/stop interruptions incoming frames to not cancel tasks and be processed properly.
Other
-
Added
studypal
example (from to the Cartesia folks!). -
Most examples now use Cartesia.
-
Added examples
foundational/19a-tools-anthropic.py
,foundational/19b-tools-video-anthropic.py
andfoundational/19a-tools-togetherai.py
. -
Added examples
foundational/18-gstreamer-filesrc.py
andfoundational/18a-gstreamer-videotestsrc.py
that show how to useGStreamerPipelineSource
. -
Remove
requests
library usage. -
Cleanup examples and use
DailyRESTHelper
.