Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/link scraper as an action #430

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Eyobyb
Copy link
Collaborator

@Eyobyb Eyobyb commented Oct 7, 2024

Description

  • Add a link scraper action which will scrape and summarize with at most 3000 tokens of result.
  • Add a test for the link scraper actions

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Maintenance
  • New release

Related issues

Mention related GitHub and Linear issues. E.g. Closes #xxx or Fixes #xxx. Otherwise delete this section.

Checklists

To speed up the review process, please follow these checklists:

Development

  • The Pull Request is small and focused on one topic
  • Lint rules pass locally (make format && make lint)
  • The code changed/added as part of this pull request has been covered with tests
  • All tests related to the changed code pass in development (make test)
  • The changes generate no new warnings (or explain any new warnings and why they're ok)
  • Commit messages are detailed
  • Changed code is self-explanatory and/or I added comments
  • I updated the documentation (docstrings, /docs)
    See the testing guidelines for help on tests, especially those involving web services.

Code review

  • This pull request has a descriptive title and information useful to a reviewer. There may be a screenshot or screencast attached.
  • I have performed a self-review of my code
  • Issue from task tracker has a link to this pull request

💔 Thank you for submitting a pull request!

This commit refactors the `tools.py` file by organizing imports and removing unused imports. It also adds a new class `LinkScraperTool` which allows accessing the content of a link and extracting information from it. This tool can be used when there is a need to scrape data from a link.
}
usage: str = "Simple link scraper that scrapes the data from url and returns string"
scraper_tool: LinkScraperTool = Field(default_factory=LinkScraperTool)
print(scraper_tool)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print

raise ValueError("Scraper returned no content.")
return str(result)
except Exception as e:
print(f"Error during scraping: {e}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logger.debug instead

scraped_data = {"data": "", "status": 404}
else:
scraped_data = scrape_with_url(link)
print("%m" * 70)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print


final_summary.append({"data": chunk_summary, "link": link})
else:
print("Scraping failed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user logger.debug

@amirfz amirfz removed the request for review from 20001LastOrder October 11, 2024 12:47
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\n**History of Previous Actions**:\\n\\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://www.ethiopiancalendar.net/\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"}
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\n**History of Previous Actions**:\\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://www.ethiopiancalendar.net/\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"}
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\n**History of Previous Actions**:\\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://www.ethiopiancalendar.net/\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"}
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nContext: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\nAction - Result History:\\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\n\\nGiven the context and the action-result history, please complete the task mentioned. Include any links you used from the context and history in the result.\\nTask: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\nResult:\\n", "agent": "human"}], "output": "The Ethiopian calendar is unique because it follows a different system than the Gregorian calendar commonly used in the Western world. The Ethiopian calendar has 13 months, with 12 months having 30 days each and a 13th month called Pagum\u0113 with 5 or 6 days, depending on whether it is a leap year. This makes the Ethiopian year shorter than the Gregorian year by about 7-8 years.\\n\\nAdditionally, the Ethiopian calendar is about 7-8 years behind the Gregorian calendar. For example, while the year 2021 is recognized in the Gregorian calendar, it is currently the year 2013 in the Ethiopian calendar.\\n\\nThe Ethiopian calendar also has its own unique New Year celebration called Enkutatash, which falls on September 11th (or September 12th in leap years) according to the Gregorian calendar.\\n\\nFor more information, you can visit the official website of the Ethiopian calendar at https://www.ethiopiancalendar.net/.", "llm_name": "gpt-3.5-turbo"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file is it requered?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Its auto generated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants