Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Feature] Add_Data_IO_Functions #281

Merged
merged 18 commits into from
Sep 13, 2023
Merged

[New Feature] Add_Data_IO_Functions #281

merged 18 commits into from
Sep 13, 2023

Conversation

Wendong-Fan
Copy link
Member

💡 User Story:
As an AI engineer,
I want to add data input capability to CAMEL
So that our user can take different kinds of files as input

🛒 Types of Changes:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

🚀 Acceptance Criteria:

  1. Functional code can handle with:
    .docx
    .pdf
    .txt
    .html
    .json
    Convert information from all these file types into string

  2. Apply the functional code to CAMEL master branch successfully, and pass all tests.

🔀 Related User Stories:
No story related

🛠️ Technical Details:
Use BytesIO, reference Langchain data input module

📑 Tasks:
.docx
.pdf
.txt
.html
.json

🔑 Dependencies:
No dependencies needed

🤝 Required Prerequisites:

@Wendong-Fan Wendong-Fan self-assigned this Sep 7, 2023
@Wendong-Fan Wendong-Fan changed the title [New Feature] add_data_io_functions [New Feature] Add_Data_IO_Functions Sep 7, 2023
@Wendong-Fan
Copy link
Member Author

Hi @lightaime , I have finished the first version data I/O module, for the linting test, I used # type: ignore to pass the test (see my last two commits) since I don't know how can I fix it in another way. Please feel free to review my code and give me suggestions or comments, thank you~

Copy link
Collaborator

@Obs01ete Obs01ete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is good overall. You bring in some binaries for the purpose of testing. They are smaller than 1 MB and it is fine. However test/data_samples/test_hello_multi.pdf is quite big, 300kb. Let it be. Also I am not sure what to do with this poetry lock. Is it possible to revert all hunks except for the newly added packages?

from io import BytesIO
from typing import Any, Dict, List, Optional

import docx2txt # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove # type: ignore, instead add them to mypy exceptions in pyproject.toml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dmitrii, thank you for your review, I have made the adjustment as you suggested and I used poetry lock to update the poetry.lock file

Copy link
Collaborator

@dandansamax dandansamax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful feature, Thanks! Just leave some comments of docs and code style. Committing poetry.lock is necessary since this PR update dependencies

camel/functions/data_io_functions.py Show resolved Hide resolved
camel/functions/data_io_functions.py Show resolved Hide resolved
Copy link
Collaborator

@dandansamax dandansamax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Wendong. Thank you for your high quality code. Please feel free to merge it if no more comments.

Copy link
Collaborator

@Obs01ete Obs01ete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good. Approved.

@Obs01ete Obs01ete merged commit 03b7d27 into master Sep 13, 2023
8 checks passed
@Obs01ete Obs01ete deleted the data_i/o_wendong branch September 13, 2023 09:49
@Obs01ete Obs01ete mentioned this pull request Sep 13, 2023
14 tasks
@dandansamax dandansamax mentioned this pull request Sep 13, 2023
8 tasks
@Wendong-Fan Wendong-Fan mentioned this pull request Sep 23, 2023
10 tasks
@lightaime
Copy link
Member

The docstring format of this PR is not consistent with others. We may need to fix the format.

@Wendong-Fan Wendong-Fan mentioned this pull request Oct 28, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants