Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(integrations): Add llama_parse integration for document parsing. #823

Merged
merged 7 commits into from
Nov 10, 2024

Conversation

Vedantsahai18
Copy link
Contributor

@Vedantsahai18 Vedantsahai18 commented Nov 9, 2024

Important

Add Llama document parsing integration with new classes, configuration updates, and tests.

  • Integration:
    • Add LlamaParseIntegrationDef and LlamaParseIntegrationDefUpdate classes in Tools.py for Llama document parsing.
    • Add LlamaParseFetchArguments, LlamaParseFetchArgumentsUpdate, LlamaParseSetup, and LlamaParseSetupUpdate classes for handling setup and arguments.
    • Implement parse() function in llama_parse.py for parsing documents using LlamaParse.
  • Configuration:
    • Add llama-index and llama-parse dependencies to pyproject.toml.
  • Testing:
    • Add MockLlamaParseClient in mocks/llama_parse.py for testing.
    • Add test_llama_parse_provider() in test_providers.py to test LlamaParse provider configuration.
    • Update conftest.py to include LlamaParse in mocked services.

This description was created by Ellipsis for 239e6c7. It will automatically update as commits are pushed.

@creatorrr creatorrr marked this pull request as ready for review November 10, 2024 05:43
@creatorrr
Copy link
Contributor

lgtm

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to a0781c8 in 1 minute and 0 seconds

More details
  • Looked at 1110 lines of code in 15 files
  • Skipped 1 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. agents-api/agents_api/autogen/Tools.py:1020
  • Draft comment:
    The file field is required in LlamaParseFetchArguments but optional in LlamaParseFetchArgumentsUpdate. Ensure this is intentional to avoid unexpected behavior.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. integrations-service/integrations/autogen/Tools.py:1021
  • Draft comment:
    The file field is required in LlamaParseFetchArguments but optional in LlamaParseFetchArgumentsUpdate. Ensure this is intentional to avoid unexpected behavior.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_NYxAFUDKgAtV27FC


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 239e6c7 in 29 seconds

More details
  • Looked at 145 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. agents-api/agents_api/common/protocol/remote.py:96
  • Draft comment:
    Use BaseRemoteModel instead of a string for the return type hint in unload_all for clarity and consistency.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The unload_all method in BaseRemoteModel and RemoteList returns self, which is a good practice for method chaining. However, the return type hint in BaseRemoteModel is a string, which is unnecessary since the class is already defined. It should be updated to use the class itself for clarity and consistency.
2. agents-api/agents_api/common/storage_handler.py:83
  • Draft comment:
    Consider refactoring the handling of BaseModel and BaseRemoteModel instances in load_args to reduce code repetition and improve maintainability. This applies to similar code blocks in the function.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In agents-api/agents_api/common/storage_handler.py, the load_args function has repetitive code for handling BaseModel and BaseRemoteModel instances. This can be refactored to improve readability and maintainability.
3. cookbooks/01-Website_Crawler_using_Spider.py:40
  • Draft comment:
    Avoid directly embedding sensitive information like spider_api_key in strings. Consider using environment variables or a secure vault to manage sensitive data.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In cookbooks/01-Website_Crawler_using_Spider.py, the spider_api_key is directly inserted into the YAML string using an f-string. This can lead to security issues if the key is logged or exposed. It's better to use a more secure method to handle sensitive information.

Workflow ID: wflow_RDzLiSz7qZkHlBIH


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped PR review on 6724b7f because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

@creatorrr creatorrr changed the title F/integrations llama parse: added Llama document parsing. feat(integrations): Add llama_parse integration for document parsing. Nov 10, 2024
@creatorrr creatorrr merged commit b6308f1 into dev Nov 10, 2024
16 checks passed
@creatorrr creatorrr deleted the f/integrations-llama-parse branch November 10, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants