Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Enhancements from EA #472

Conversation

davidvonthenen
Copy link
Contributor

@davidvonthenen davidvonthenen commented Oct 18, 2024

Proposed changes

This ports enhancements from the Agent EA work that is happening. Since that work will probably take some time to get to GA, backporting the things I can.

Changes:

Types of changes

What types of changes does your code introduce to the community Python SDK?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update or tests (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have lint'ed all of my code using repo standards
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

NA

Summary by CodeRabbit

  • New Features

    • Enhanced microphone control in the application, allowing users to mute and manage audio input effectively.
    • Added support for microphone functionality in the AsyncSpeakWSClient and SpeakWSClient classes.
    • Introduced a new speech-to-text functionality in the example application.
    • Simplified audio data handling in example applications for improved user experience.
  • Bug Fixes

    • Improved logging mechanisms to prevent potential errors related to thread handling and HTTP responses.
  • Documentation

    • Updated example scripts to reflect changes in audio handling and configuration options.

Copy link
Contributor

coderabbitai bot commented Oct 18, 2024

Walkthrough

The pull request introduces enhancements across multiple classes related to audio processing. The Microphone class now supports improved audio stream initialization and state management, including a new method for checking the muted state. The Speaker class integrates microphone control, allowing for better synchronization during audio playback. Logging mechanisms in websocket client classes are fortified to prevent errors. Additionally, example scripts for text-to-speech have been updated for better audio data handling and connection management, with some features removed for simplification.

Changes

File Path Change Summary
deepgram/audio/microphone/microphone.py Updated Microphone class to improve audio stream initialization and state management. Added is_muted method and logging enhancements.
deepgram/audio/speaker/speaker.py Enhanced Speaker class to support microphone control. Added _microphone attribute and wait_for_complete_with_mute method.
deepgram/clients/common/v1/abstract_async_websocket.py Modified finish method to enhance logging by checking thread object before accessing its name.
deepgram/clients/common/v1/abstract_sync_websocket.py Updated finish method for improved logging robustness with checks for thread before accessing its name.
deepgram/clients/speak/v1/websocket/async_client.py Added support for microphone in AsyncSpeakWSClient with a new optional parameter and improved initialization logic.
deepgram/clients/speak/v1/websocket/client.py Updated SpeakWSClient to support microphone input with new optional parameter and improved cleanup in the finish method.
examples/text-to-speech/websocket/output_to_wav/main.py Modified to include wave module and AUDIO_FILE constant. Simplified audio data handling and event management.
examples/text-to-speech/websocket/simple/main.py Removed wave module and file handling. Updated DeepgramClient configuration and added new event handlers for websocket management.
deepgram/clients/common/v1/abstract_async_rest.py Added error handling in _handle_request method to check for None responses and raise DeepgramError.
deepgram/clients/common/v1/abstract_sync_rest.py Introduced error handling in _handle_request method to ensure response validity and raise DeepgramError if conditions are not met.
deepgram/clients/listen/v1/helpers.py Deleted file containing utility functions for URL manipulation: append_query_params and convert_to_websocket_url.
deepgram/clients/speak/v1/websocket/helpers.py Deleted file containing utility functions for URL manipulation: append_query_params and convert_to_websocket_url.
examples/speech-to-text/websocket/replay/main.py Added new file implementing speech-to-text functionality using Deepgram SDK, with websocket connection and transcription processing.

Possibly related PRs

Suggested reviewers

  • lukeocodes
  • SandraRodgers
  • jpvajda
  • naomi-lgbt

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (7)
deepgram/clients/common/v1/abstract_sync_websocket.py (1)

448-451: Approved: Improved error handling in thread logging

The changes enhance the robustness of the logging mechanism by adding null checks for the thread object and its name attribute. This prevents potential AttributeError exceptions and improves the overall reliability of the SDK.

Consider using a more descriptive message for the unknown thread name case:

-                self._logger.debug("after running thread: unknown_thread_name")
+                self._logger.debug("after running thread: thread or thread name is None")

This change would provide more precise information about why the thread name is unknown.

deepgram/clients/common/v1/abstract_async_websocket.py (1)

451-454: Approve changes with a minor suggestion for consistency.

The added null checks for thread and thread.name improve the robustness of the logging mechanism, preventing potential AttributeError exceptions. This is a good defensive programming practice, especially in a library or SDK context.

For consistency with the error logging throughout the rest of the file, consider using self._logger.debug() instead of directly accessing self._logger.debug. This minor change would align with the logging style used elsewhere in the class.

Here's a suggested modification:

 if thread is not None and thread.name is not None:
-    self._logger.debug("after running thread: %s", thread.name)
+    self._logger.debug("after running thread: %s", thread.name)
 else:
-    self._logger.debug("after running thread: unknown_thread_name")
+    self._logger.debug("after running thread: unknown_thread_name")

This change is optional and doesn't affect functionality, but it maintains consistency with the logging style used throughout the rest of the class.

examples/text-to-speech/websocket/simple/main.py (1)

43-49: Update the message to reflect the current configuration.

The message in on_binary_data suggests setting speaker_playback to true, but this option is already set in your DeepgramClientOptions at lines 28-29. This could confuse users reviewing the example.

Consider updating or removing the message to accurately reflect the current configuration:

             print("Received binary data")
             print("You can do something with the binary data here")
             print("OR")
-            print(
-                "If you want to simply play the audio, set speaker_playback to true in the options for DeepgramClientOptions"
-            )
+            print("Ensure that 'speaker_playback' is configured correctly in DeepgramClientOptions.")
deepgram/audio/speaker/speaker.py (1)

357-371: Implement microphone mute/unmute logic during playback

The added code in the _play method correctly manages the microphone's mute state based on audio playback. It unmutes the microphone after a period of silence exceeding _last_play_delta_in_ms milliseconds and mutes it when new audio data is detected. This ensures that the microphone is not capturing audio while playback is occurring.

Consider adding inline comments to explain the logic steps for better readability and future maintenance.

examples/text-to-speech/websocket/output_to_wav/main.py (3)

23-29: Remove commented-out code for clarity

The commented-out code from lines 23 to 28 may no longer be needed. Removing unnecessary code can improve readability and maintainability.


38-38: Limit console output in on_binary_data

The print("Received binary data") statement inside on_binary_data may generate excessive console output, especially if the data stream is large. Consider removing this print statement or using a logging mechanism with an appropriate log level.


76-76: Re-evaluate the necessity of time.sleep(7)

The time.sleep(7) introduces a delay before prompting the user to press Enter. Consider whether this delay is needed, as input() will wait for the user's action.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 84fc18b and 5754a1b.

📒 Files selected for processing (8)
  • deepgram/audio/microphone/microphone.py (4 hunks)
  • deepgram/audio/speaker/speaker.py (9 hunks)
  • deepgram/clients/common/v1/abstract_async_websocket.py (1 hunks)
  • deepgram/clients/common/v1/abstract_sync_websocket.py (1 hunks)
  • deepgram/clients/speak/v1/websocket/async_client.py (5 hunks)
  • deepgram/clients/speak/v1/websocket/client.py (5 hunks)
  • examples/text-to-speech/websocket/output_to_wav/main.py (4 hunks)
  • examples/text-to-speech/websocket/simple/main.py (4 hunks)
🧰 Additional context used
🔇 Additional comments (21)
examples/text-to-speech/websocket/simple/main.py (1)

63-71: Great addition of event handlers for warnings and errors.

Adding on_warning, on_error, and on_unhandled event handlers enhances the robustness of your application by allowing it to gracefully handle unexpected situations and provide meaningful feedback.

deepgram/audio/microphone/microphone.py (6)

25-26: Appropriate use of Optional typing for _audio and _stream

The addition of Optional typing and initializing _audio and _stream to None enhances clarity about their initialization state and potential absence.


148-158: Proper check before opening the stream

The condition if self._audio is not None: ensures that _audio.open() is only called when _audio is properly initialized, preventing potential AttributeErrors.


160-163: Error handling when _stream is None

Adding a check for self._stream is None with appropriate error logging and returning False improves robustness by handling cases where the stream could not be created.


166-167: Conditional start of the stream

Starting the stream only if self._stream is not None prevents exceptions and ensures that the code doesn't attempt to start a non-existent stream.


180-184: Enhanced logging in mute method

Adding verbose logging at the entry and exit of the mute method improves traceability and aids in debugging.


262-267: Proper resource cleanup in finish method

Setting _stream and _asyncio_thread to None after stopping ensures resources are properly released and prevents potential issues with lingering references.

deepgram/audio/speaker/speaker.py (7)

18-19: Import statement for Microphone module is correctly added

The import statement from ..microphone import Microphone is appropriate and necessary for integrating with the Microphone class.


61-62: Adding _microphone attribute to manage microphone state

The addition of the _microphone attribute of type Optional[Microphone] allows the Speaker class to interact with the Microphone, enabling coordination between audio playback and microphone mute status.


88-89: Initialize _microphone attribute in constructor

The assignment self._microphone = microphone correctly initializes the _microphone attribute with the provided microphone instance, enabling the speaker to control the microphone's mute state.


153-163: Add check for self._audio before opening audio stream

The added condition if self._audio is not None: ensures that the audio stream is only opened if the PyAudio instance exists. This prevents potential AttributeError exceptions if self._audio is None.


164-167: Add error handling for failed stream initialization

The check if self._stream is None: properly handles cases where the audio stream could not be created, logging an error message and returning False to indicate the failure.


179-180: Ensure self._stream exists before starting

By verifying if self._stream is not None: before calling self._stream.start_stream(), you prevent possible exceptions that could occur if the stream was not successfully created.


330-344: Enhance resource cleanup in finish method

The updates to the finish method ensure that threads (_thread and _receiver_thread) are properly joined, and resources like the audio queue and streams are cleared and set to None. This improves resource management and prevents potential threading issues or memory leaks.

deepgram/clients/speak/v1/websocket/client.py (6)

67-69: Introduction of New Instance Variables for Speaker and Microphone

New instance variables _speaker_created: bool = False, _speaker: Optional[Speaker] = None, and _microphone: Optional[Microphone] = None have been added to manage the state of the speaker and microphone within the SpeakWSClient class. This enhancement is well-implemented and logically sound.


71-73: Addition of Optional 'microphone' Parameter to Constructor

The __init__ method now includes an optional microphone parameter with a default value of None. This change maintains backward compatibility while allowing users to provide a Microphone instance when needed.


91-93: Assignment of Microphone Instance Variable

The self._microphone variable is correctly assigned from the constructor parameter. This ensures that the Microphone instance is available throughout the class for any microphone-related functionality.


118-119: Setting '_speaker_created' Flag After Speaker Initialization

The _speaker_created flag is set to True after initializing the Speaker. This flag effectively tracks the initialization state of the speaker, which is important for proper resource management during the lifecycle of the SpeakWSClient.


127-127: Passing 'microphone' to Speaker Initialization

The self._microphone instance is passed to the Speaker during initialization at lines 127 and 135. This integration allows the Speaker to interact with the Microphone, enabling features such as automatic microphone muting during playback.

Also applies to: 135-135


639-642: ⚠️ Potential issue

Avoid Redundant Calls to 'self._speaker.finish()'

In the finish method, self._speaker.finish() is called within the if block when self._speaker_created is True (lines 639-641). Later in the same method, self._speaker.finish() is called again unconditionally during the cleanup process. This could result in finish() being called twice on the same Speaker instance, which may cause errors if finish() is not idempotent.

Consider modifying the code to ensure that self._speaker.finish() is only called once. Here's a possible fix:

if self._speaker is not None and self._speaker_created:
    self._speaker.finish()
    self._speaker_created = False

    # Remove the redundant call to self._speaker.finish() below
    - self._speaker.finish()
    self._speaker = None
    self._logger.notice("speaker stopped")

This change prevents potential errors by ensuring that finish() is called only once on the Speaker instance.

Likely invalid or redundant comment.

examples/text-to-speech/websocket/output_to_wav/main.py (1)

7-7: LGTM!

The import of the wave module is necessary for handling WAV file operations.

deepgram/audio/microphone/microphone.py Show resolved Hide resolved
deepgram/audio/microphone/microphone.py Show resolved Hide resolved
@davidvonthenen davidvonthenen force-pushed the enhancement-usability-for-stt-and-tts-ws branch 2 times, most recently from 37c7b40 to ae2c305 Compare October 18, 2024 00:30
@davidvonthenen davidvonthenen force-pushed the enhancement-usability-for-stt-and-tts-ws branch from ae2c305 to f3cc7a5 Compare October 18, 2024 16:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (17)
examples/text-to-speech/websocket/output_to_wav/main.py (3)

23-29: LGTM: Simplified client setup

The removal of commented-out configuration code and the simplified DeepgramClient instantiation improve the clarity of the example. This change makes it easier for users to understand the basic setup.

Consider adding a brief comment explaining why the default configuration is used, to provide context for users who might be looking for more advanced setups.


50-56: LGTM: WAV header generation

The addition of WAV header generation is crucial for creating a valid audio file. The code correctly sets the audio parameters using the wave module.

Consider adding a brief comment explaining why we're generating the header separately, to provide context for users who might be unfamiliar with audio file formats.


76-76: LGTM: Improved connection handling

The addition of a sleep timer before closing the connection is a good approach to ensure the audio generation completes. This change, along with the removal of the wait_for_complete method, simplifies the example.

Consider adding a brief comment explaining the purpose of the sleep timer, to help users understand why this delay is necessary.

examples/speech-to-text/websocket/replay/main.py (1)

5-16: Remove unused imports to improve code clarity.

The following imported modules are not used in the visible code:

  • httpx (line 5)
  • logging (line 7)
  • threading (line 9)

Consider removing these imports if they are not used elsewhere in the file or in imported modules.

Also applies to: 18-18

examples/text-to-speech/websocket/simple/main.py (3)

25-32: LGTM: Configuration updates enhance example usability.

The addition of the 'speaker_playback' option and simplification of logging improve the example. However, consider adding a comment explaining the empty string in the DeepgramClient initialization, e.g.:

# Replace empty string with your Deepgram API key in a real application
deepgram: DeepgramClient = DeepgramClient("", config)

This would provide clarity for users adapting this example for their own use.


51-58: LGTM: New event handlers improve example comprehensiveness.

The addition of on_metadata, on_flush, and on_clear event handlers enhances the example's coverage of Deepgram SDK capabilities. For consistency, consider adding a brief comment above each handler explaining its purpose, similar to other handlers in the file.


96-96: LGTM: Improved connection management with wait_for_complete().

The replacement of the sleep function with wait_for_complete() is a significant improvement, providing a more robust way to ensure the operation has finished. Consider adding a brief comment explaining the purpose of wait_for_complete() for users who might be unfamiliar with its functionality.

Also applies to: 101-102

deepgram/audio/microphone/microphone.py (2)

148-163: LGTM: Improved error handling in start method

The additional checks for self._audio and self._stream being None improve the robustness of the start method. The error logging for stream creation failure is also a good practice.

Consider adding a log message when self._audio is None:

 if self._audio is not None:
     self._stream = self._audio.open(
         # ... (existing parameters)
     )
+else:
+    self._logger.error("start failed. Audio interface not initialized.")

This would provide more specific information about why the stream creation failed.


213-235: LGTM: Added is_muted method, but consider clarifying documentation

The addition of the is_muted method is a good improvement, providing a way to check the mute state of the microphone.

Consider clarifying the documentation to explicitly state the behavior when the stream is None:

 def is_muted(self) -> bool:
     """
     is_muted - returns the state of the stream

     Args:
         None

     Returns:
-        True if the stream is muted, False otherwise
+        True if the stream is muted, False if the stream is unmuted or not initialized
     """

This would make the method's behavior more explicit and help prevent potential confusion for users of the class.

deepgram/clients/common/v1/abstract_sync_rest.py (1)

228-234: Approve changes with a minor suggestion for consistency.

The new error handling for None responses is a good addition that improves the robustness of the _handle_request method. It prevents potential NoneType exceptions and provides a clear, actionable error message to the user.

For consistency with the rest of the codebase, consider using a multi-line string for the error message. Here's a suggested minor improvement:

 if response is None or response.text is None:
     raise DeepgramError(
-        "Response is not available yet. Please try again later."
+        """
+        Response is not available yet. Please try again later.
+        """
     )

This change maintains consistency with other multi-line strings used in error messages throughout the SDK.

deepgram/clients/common/v1/abstract_async_rest.py (1)

232-237: Improved error handling for None responses. Consider adding logging.

The addition of this check for None responses is a good improvement to the error handling. It prevents potential NoneType exceptions and provides a clear error message.

Consider adding a debug log before raising the exception. This could help with troubleshooting in the future. For example:

import logging

# ... (existing code)

if response is None or response.text is None:
    logging.debug("Received None response or response text")
    raise DeepgramError(
        "Response is not available yet. Please try again later."
    )

This logging could provide valuable context for debugging without changing the behavior of the code.

deepgram/audio/speaker/speaker.py (2)

73-73: Update docstring for new microphone parameter

The microphone parameter has been correctly added to the __init__ method with the appropriate type. However, the method's docstring should be updated to include documentation for this new parameter.

Please update the docstring to include:

microphone (Optional[Microphone], optional): The microphone instance to control its mute state during playback.

359-373: LGTM: Enhanced microphone control in _play method

The additions to the _play method provide sophisticated microphone control during playback:

  1. It correctly handles cases where no microphone is associated.
  2. The muting/unmuting logic based on sound detection and time thresholds is well-implemented.

These changes significantly improve the speaker's functionality with regard to microphone management.

Consider the performance impact of frequent debug logging in a production environment. You might want to add a flag to enable/disable detailed logging or use a more performant logging method for these frequent operations.

deepgram/clients/speak/v1/websocket/client.py (2)

67-69: LGTM: New instance variables for audio support.

The new instance variables _speaker_created, _speaker, and _microphone are appropriate for managing the new audio functionality.

Consider adding a comment explaining the purpose of the _speaker_created flag for better code readability.


118-119: LGTM: Speaker creation and cleanup logic.

The changes for speaker creation and cleanup are well-implemented:

  • Setting the _speaker_created flag ensures proper tracking of speaker initialization.
  • Passing the microphone to the Speaker constructor allows for integration between components.
  • The cleanup logic in the finish method correctly uses the _speaker_created flag.

Consider adding error handling for the speaker initialization. For example:

try:
    self._speaker = Speaker(
        # ... existing parameters ...
        microphone=self._microphone,
    )
    self._speaker_created = True
except Exception as e:
    self._logger.error(f"Failed to initialize speaker: {e}")
    self._speaker_created = False

This would provide more robust error handling and logging if speaker initialization fails.

Also applies to: 127-127, 135-135, 639-642

deepgram/clients/speak/v1/websocket/async_client.py (2)

70-72: LGTM! Constructor updated for microphone support.

The constructor has been updated to accept an optional microphone parameter and initialize the _microphone attribute. The _speaker initialization now includes the microphone parameter, enabling the new feature of automatic microphone muting by the speaker.

Consider updating the constructor's docstring to include information about the new microphone parameter:

def __init__(
    self, config: DeepgramClientOptions, microphone: Optional[Microphone] = None
):
    """
    Initialize the AsyncSpeakWSClient.

    Args:
        config (DeepgramClientOptions): All the options for the client.
        microphone (Optional[Microphone]): An optional Microphone instance to be used by the client.
    """
    # ... rest of the method

Also applies to: 88-90, 124-124, 132-132


Line range hint 1-738: Overall, excellent implementation of new audio features!

The changes in this file successfully implement the new features described in the PR objectives:

  1. Support for an optional microphone instance.
  2. Integration of microphone control with the speaker.
  3. Proper cleanup of audio resources.

These enhancements improve the SDK's functionality for real-time text-to-speech synthesis and audio processing. The code is well-structured and maintains consistency with the existing implementation.

Consider adding unit tests to verify the new microphone and speaker integration, especially focusing on the automatic muting feature and proper resource cleanup.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5754a1b and f3cc7a5.

⛔ Files ignored due to path filters (1)
  • examples/speech-to-text/websocket/replay/microsoft_headquarters.wav is excluded by !**/*.wav
📒 Files selected for processing (13)
  • deepgram/audio/microphone/microphone.py (4 hunks)
  • deepgram/audio/speaker/speaker.py (9 hunks)
  • deepgram/clients/common/v1/abstract_async_rest.py (1 hunks)
  • deepgram/clients/common/v1/abstract_async_websocket.py (3 hunks)
  • deepgram/clients/common/v1/abstract_sync_rest.py (1 hunks)
  • deepgram/clients/common/v1/abstract_sync_websocket.py (3 hunks)
  • deepgram/clients/listen/v1/helpers.py (0 hunks)
  • deepgram/clients/speak/v1/websocket/async_client.py (5 hunks)
  • deepgram/clients/speak/v1/websocket/client.py (5 hunks)
  • deepgram/clients/speak/v1/websocket/helpers.py (0 hunks)
  • examples/speech-to-text/websocket/replay/main.py (1 hunks)
  • examples/text-to-speech/websocket/output_to_wav/main.py (4 hunks)
  • examples/text-to-speech/websocket/simple/main.py (4 hunks)
💤 Files with no reviewable changes (2)
  • deepgram/clients/listen/v1/helpers.py
  • deepgram/clients/speak/v1/websocket/helpers.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • deepgram/clients/common/v1/abstract_async_websocket.py
  • deepgram/clients/common/v1/abstract_sync_websocket.py
🧰 Additional context used
📓 Learnings (3)
deepgram/audio/microphone/microphone.py (1)
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: deepgram/audio/microphone/microphone.py:274-301
Timestamp: 2024-10-18T00:26:54.280Z
Learning: In the `deepgram/audio/microphone/microphone.py` file, within the `Microphone` class's `_callback` method, re-raising exceptions is intended behavior.
deepgram/clients/speak/v1/websocket/async_client.py (1)
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: deepgram/clients/speak/v1/websocket/async_client.py:643-646
Timestamp: 2024-10-18T00:29:32.961Z
Learning: In `deepgram/clients/speak/v1/websocket/async_client.py`, the double call to `self._speaker.finish()` in the `finish` method of the `AsyncSpeakWSClient` class is intentional and required for proper cleanup.
examples/text-to-speech/websocket/output_to_wav/main.py (2)
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: examples/text-to-speech/websocket/output_to_wav/main.py:16-17
Timestamp: 2024-10-18T00:30:20.224Z
Learning: In `examples/text-to-speech/websocket/output_to_wav/main.py`, the hardcoded values for `AUDIO_FILE` and `TTS_TEXT` are intentional and should remain as is.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: examples/text-to-speech/websocket/output_to_wav/main.py:38-41
Timestamp: 2024-10-18T00:30:01.884Z
Learning: In the `examples/text-to-speech/websocket/output_to_wav/main.py` file, the code is intended as a simple example, and additional error handling for file operations is not required.
🔇 Additional comments (24)
examples/text-to-speech/websocket/output_to_wav/main.py (2)

7-7: LGTM: Import and constant updates

The addition of the wave module import and the new AUDIO_FILE constant, along with the updated TTS_TEXT, are appropriate changes that support the enhanced functionality of the script.

Also applies to: 16-17


38-41: LGTM: Streamlined audio data handling

The updated on_binary_data function efficiently writes the received audio data to the file. The use of a with statement ensures proper file handling, which is a good practice.

examples/speech-to-text/websocket/replay/main.py (1)

1-100: Overall assessment: Good implementation with room for improvements.

This example script effectively demonstrates the usage of the Deepgram SDK for live transcription via websocket. It covers the essential steps: client initialization, websocket connection setup, event handling, and audio data processing.

Key strengths:

  1. Clear structure and flow of operations.
  2. Proper use of the Deepgram SDK's features.
  3. Event-driven architecture for handling websocket events.

Areas for improvement:

  1. Enhanced error handling, particularly for connection initialization and file I/O.
  2. More robust API key management.
  3. Removal of unused imports.
  4. Potential for adding retry mechanisms for increased reliability.

Implementing the suggested changes will result in a more robust and maintainable example that better showcases best practices in using the Deepgram SDK.

examples/text-to-speech/websocket/simple/main.py (5)

7-7: LGTM: Changes enhance example clarity.

The shorter TTS_TEXT is more suitable for a simple example. The introduction of warning_notice as a global variable, while generally discouraged in production code, is acceptable in this context as it simplifies the example's flow. These changes align with the intended functionality of the example.

Also applies to: 16-19


41-50: LGTM: Enhanced binary data handling explanation.

The updated on_binary_data function now provides clear, informative messages about binary data handling options. This improves the educational value of the example, helping users understand the implications of the 'speaker_playback' option.


63-70: LGTM: Additional event handlers improve error handling demonstration.

The new on_warning, on_error, and on_unhandled event handlers provide a comprehensive demonstration of error and edge case handling in the Deepgram SDK. This addition enhances the educational value of the example.


74-80: LGTM: Comprehensive event registration enhances example.

The addition of event registrations for metadata, flushing, clearing, warnings, errors, and unhandled events provides a complete demonstration of the Deepgram SDK's event handling capabilities. This comprehensive approach will be valuable for users learning to work with the SDK.


Line range hint 1-120: Overall: Excellent enhancements to the example file.

This update significantly improves the main.py example by:

  1. Introducing comprehensive event handling (metadata, flushing, clearing, warnings, errors).
  2. Improving connection management with wait_for_complete().
  3. Enhancing the educational value through informative print statements.
  4. Simplifying the example for better clarity.

These changes align well with the PR objectives and provide a more robust demonstration of the Deepgram SDK's capabilities. The example now offers a more complete picture of working with the SDK, which will be valuable for users.

Great job on these enhancements!

deepgram/audio/microphone/microphone.py (5)

25-26: LGTM: Improved type annotations for _audio and _stream

The use of Optional types for _audio and _stream attributes is a good practice. It clearly indicates that these attributes can be None, which improves type safety and makes the code more self-documenting.


166-167: LGTM: Added null check before starting stream

The additional check to ensure self._stream is not None before calling start_stream() is a good defensive programming practice. It prevents potential null pointer exceptions and improves the overall robustness of the code.


180-180: LGTM: Refined log levels for better granularity

The change from debug to verbose log levels in the mute and unmute methods is a good improvement. It allows for more granular control over logging output and is consistent with the use of the verboselogs library. This change will help in debugging and monitoring the application's behavior more effectively.

Also applies to: 184-184, 190-190, 200-200, 204-204, 210-210


262-267: LGTM: Improved cleanup in finish method

The updates to the asyncio loop stopping process and the addition of setting _stream and _asyncio_thread to None after cleanup are good improvements. These changes ensure a more thorough cleanup process and help prevent accidental use of closed resources.


274-301: LGTM: Added _callback method for stream processing

The addition of the _callback method is a good implementation for handling stream data processing. It correctly handles the muting functionality and properly manages the stream state.

The exception handling, including re-raising exceptions, aligns with the intended behavior as per previous discussions. This approach allows for proper error propagation while ensuring errors are logged.

deepgram/clients/common/v1/abstract_sync_rest.py (1)

Line range hint 1-390: Overall assessment: Changes improve error handling and align with PR objectives.

The modifications to the _handle_request method enhance the robustness of the AbstractSyncRestClient class by adding a check for None responses. This change aligns well with the PR objectives of improving the SDK's functionality and is a valuable addition to the error handling mechanism.

The rest of the file remains unchanged, maintaining its existing functionality for making various types of HTTP requests. The new error handling integrates seamlessly with the existing code structure.

deepgram/audio/speaker/speaker.py (4)

18-18: LGTM: Import statement for Microphone

The import statement for Microphone is correctly placed and necessary for the new microphone control functionality.


61-61: LGTM: Addition of _microphone attribute

The _microphone attribute is correctly typed as Optional[Microphone] and initialized to None. This is appropriate for the new optional microphone control functionality.


88-88: LGTM: Initialization of _microphone attribute

The _microphone attribute is correctly initialized with the microphone parameter passed to the __init__ method.


153-167: LGTM: Improved robustness in start method

The changes to the start method enhance its robustness:

  1. The new checks for self._audio and self._stream prevent potential errors.
  2. Moving the stream start inside a check for self._stream ensures it's only called when a stream exists.

These modifications improve the method's error handling and reliability.

Also applies to: 179-180

deepgram/clients/speak/v1/websocket/client.py (4)

30-31: LGTM: New imports for audio functionality.

The new imports for Microphone, Speaker, and related constants are appropriate for the added audio functionality.


71-73: LGTM: Constructor updated for microphone support.

The addition of the optional microphone parameter to the constructor aligns with the PR objectives and allows for flexible microphone integration.


91-93: LGTM: Microphone initialization.

The initialization of the _microphone instance variable with the provided microphone is correct and straightforward.


Line range hint 1-672: Overall assessment: Changes successfully implement microphone and speaker integration.

The modifications to the SpeakWSClient class effectively implement the required functionality for microphone support and speaker integration. Key points:

  1. New imports and instance variables are correctly added.
  2. The constructor is updated to accept an optional microphone parameter.
  3. Speaker creation and cleanup logic is well-implemented.
  4. The changes align with the PR objectives of backporting enhancements from the Agent EA work.

The code is generally well-structured and follows good practices. Minor suggestions for improvement have been made in previous comments.

deepgram/clients/speak/v1/websocket/async_client.py (2)

30-32: LGTM! New imports and attributes for audio functionality.

The new imports and class attributes (_speaker_created, _speaker, and _microphone) align with the PR objectives of adding microphone and speaker functionality to the SDK. These changes provide the necessary foundation for implementing the new features.

Also applies to: 66-69


643-646: LGTM! Proper cleanup of speaker instance.

The addition of this block ensures that the speaker instance is properly finished and reset when the client is done. The double call to self._speaker.finish() is intentional:

  1. The first call (line 644) is specific to speaker cleanup.
  2. The second call (in the existing code) is part of the general cleanup process.

This approach ensures thorough cleanup of the speaker resources.

@davidvonthenen davidvonthenen merged commit eb4ed92 into deepgram:main Oct 18, 2024
5 checks passed
@davidvonthenen davidvonthenen deleted the enhancement-usability-for-stt-and-tts-ws branch October 18, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants