✅ Handling & Handshakes with Web RTC Implementation with Simli Avatar are working
✅ Avatar Video stream ist working
✅ Avatar Audio stream is working
❌ LiveKit sent Audio to Simli (error on LiveKit to catch the TTS audio stream for sending to LiveKit, solution might be to clone the livekit python api oder build a wrapper)
🪄 most of the magic is in the agent-1/main.py file
These commands will install LiveKit server on your machine and run it in dev mode. Dev mode uses a specific API key and secret pair.
brew install livekit
livekit-server -dev
Usually you'd run the agent(s) first and then start a session and the agent(s) would automatically join. Turns out that isn't how it works for multi-agent at the moment. So what we're going to do is have the human join the meeting first, and then explicitly have the agents join the room.
cd meet
pnpm i
cp .env.example .env.local
pnpm dev
- open
localhost:3000
in a browser and click 'Start Meeting' - note the room name in your browser address bar:
http://localhost:3000/rooms/<room-name>
python main.py connect --room demoroom
The main.py
script in the agent-1
directory is designed to facilitate a WebRTC session using a voice assistant and video processing capabilities.
-
prewarm(proc: JobProcess)
:- Loads the voice activity detection (VAD) model from the
silero
plugin and stores it in the process's user data for later use.
- Loads the voice activity detection (VAD) model from the
-
SilentAudioTrack
andBlackVideoTrack
Classes:- Purpose: Generate silent audio and black video frames, respectively.
- Usage: These tracks are used as placeholders or default tracks in the WebRTC session.
-
entrypoint(ctx: JobContext)
:- Purpose: Main function to set up and manage the WebRTC session.
- Steps:
- Initializes a
VoiceAssistant
with VAD, STT, and TTS capabilities. - Connects to the LiveKit room and starts the voice assistant.
- Simli API Setup:
- Retrieves Simli API keys (
SIMLI_API_KEY
andSIMLI_FACE_ID
) from environment variables. - Starts an audio-to-video session with Simli to obtain a session token using
start_audio_to_video_session
.
- Retrieves Simli API keys (
- Configures ICE servers and creates an
RTCPeerConnection
. - Sets up a
DataChannel
to send the session token. - Subscribes to incoming media tracks (audio and video) and relays them using
MediaRelay
. - Creates and sends a WebRTC offer to Simli, then sets the remote description with the received answer using
start_webrtc_session
. - Monitors ICE connection state changes and logs relevant information.
- Initializes a
-
start_audio_to_video_session(api_key, face_id)
:- Purpose: Initiates an audio-to-video session with Simli's API.
- Process:
- Sends a POST request to Simli's API endpoint with the
faceId
,apiKey
, and other parameters. - Returns: A session token if successful, or logs an error if not.
- Sends a POST request to Simli's API endpoint with the
-
start_webrtc_session(offer_sdp, offer_type, api_key, session_token)
:- Purpose: Sends a WebRTC offer to Simli and receives an SDP answer.
- Process:
- Sends a POST request to Simli's WebRTC API endpoint with the SDP offer, type, API key, and session token.
- Returns: The SDP answer if successful, or logs an error if not.
- ICE Servers: Configured using Google's public STUN servers to facilitate NAT traversal.
- RTCPeerConnection:
- Created with the specified ICE servers.
- Manages the connection and media tracks between the local and remote peers.
- DataChannel:
- Used to send the session token to the remote peer once the channel is open.
- Track Handling:
- Subscribes to incoming tracks and relays them using
MediaRelay
. - Forwards video and audio frames to LiveKit's media sources.
- Subscribes to incoming tracks and relays them using
- SDP Offer/Answer:
- Creates an SDP offer and waits for ICE gathering to complete.
- Sends the offer to Simli and sets the remote description with the received answer.
- ICE Connection State Monitoring:
- Logs changes in the ICE connection state to provide insights into the connection status.
- **Environment Variables (in .env file) **:
SIMLI_API_KEY
: Your API key for accessing Simli's services.SIMLI_FACE_ID
: The ID of the face to be used in the audio-to-video session.
- API Endpoints:
- Audio-to-Video Session:
https://api.simli.ai/startAudioToVideoSession
- WebRTC Session:
https://api.simli.ai/StartWebRTCSession
- Audio-to-Video Session:
- Network Configuration:
- Ensure that your environment can make outbound HTTP requests to Simli's API endpoints.
cd agent-1
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
- add values for keys in
.env
python main.py connect --room <room-name>
cd agent-2
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp ../agent-1/.env .
python main.py connect --room <room-name>