-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/link scraper as an action #430
Open
Eyobyb
wants to merge
7
commits into
main
Choose a base branch
from
feature/link_scraper_as_an_action
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
80ec85e
Refactor tools.py and add LinkScraperTool
Eyobyb 68fe6b4
Refactor utils.py and add get_links_from_text function
Eyobyb 9062a0a
Add link scraper action
Eyobyb 389f593
Add integration test for LinkScraperAction
Eyobyb c7e6a18
Refactor LinkScraperAction and LinkScraperTool
Eyobyb 420b684
Merge branch 'main' into feature/link_scraper_as_an_action
amirfz b078df3
Merge branch 'main' into feature/link_scraper_as_an_action
amirfz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
from typing import Any | ||
from sherpa_ai.actions.base import BaseAction | ||
from sherpa_ai.tools import LinkScraperTool | ||
from pydantic import BaseModel, Field | ||
|
||
|
||
class LinkScraperAction(BaseAction, BaseModel): | ||
llm: Any | ||
name: str = "Link Scraper" | ||
args: dict = { | ||
"url": "the url to be scrapped", | ||
} | ||
usage: str = "Simple link scraper that scrapes the data from url and returns string" | ||
scraper_tool: LinkScraperTool = Field(default_factory=LinkScraperTool) | ||
|
||
def execute(self, url: str, **kwargs) -> str: | ||
""" | ||
Executes the scraper tool and returns the scraped data. | ||
Args: | ||
url: The url to be scrapped | ||
Returns: | ||
str: The data scrapped from the url | ||
""" | ||
try: | ||
result = self.scraper_tool._run(url,self.llm) | ||
if not result or len(result) == 0: | ||
raise ValueError("Scraper returned no content.") | ||
return str(result) | ||
except Exception as e: | ||
return "" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\n**History of Previous Actions**:\\n\\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://www.ethiopiancalendar.net/\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"} | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\n**History of Previous Actions**:\\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://www.ethiopiancalendar.net/\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"} | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\n**History of Previous Actions**:\\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://www.ethiopiancalendar.net/\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"} | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nContext: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\n\\n\\nAction - Result History:\\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\nAction: Link Scraper{'url': 'https://www.ethiopiancalendar.net/'}\\nOutput: \\n\\nGiven the context and the action-result history, please complete the task mentioned. Include any links you used from the context and history in the result.\\nTask: what is unique about Ethiopia calendar? based on these links https://www.ethiopiancalendar.net/\\nResult:\\n", "agent": "human"}], "output": "The Ethiopian calendar is unique because it follows a different system than the Gregorian calendar commonly used in the Western world. The Ethiopian calendar has 13 months, with 12 months having 30 days each and a 13th month called Pagum\u0113 with 5 or 6 days, depending on whether it is a leap year. This makes the Ethiopian year shorter than the Gregorian year by about 7-8 years.\\n\\nAdditionally, the Ethiopian calendar is about 7-8 years behind the Gregorian calendar. For example, while the year 2021 is recognized in the Gregorian calendar, it is currently the year 2013 in the Ethiopian calendar.\\n\\nThe Ethiopian calendar also has its own unique New Year celebration called Enkutatash, which falls on September 11th (or September 12th in leap years) according to the Gregorian calendar.\\n\\nFor more information, you can visit the official website of the Ethiopian calendar at https://www.ethiopiancalendar.net/.", "llm_name": "gpt-3.5-turbo"} | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\n\\n\\n**History of Previous Actions**:\\n\\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://en.wikipedia.org/wiki/Ethiopian_calendar\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"} | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\n\\n\\n**History of Previous Actions**:\\nAction: Link Scraper{'url': 'https://en.wikipedia.org/wiki/Ethiopian_calendar'}\\nOutput: \\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://en.wikipedia.org/wiki/Ethiopian_calendar\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"} | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nGiven your specialized expertise, historical context, and your mission to facilitate Machine-Learning-based solutions, determine which action and its corresponding arguments would be the most scientifically sound and efficient approach to achieve the described task.\\n\\n**Task Description**: what is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\n\\n**Possible Actions**:\\n{\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"the url to be scrapped\"\\n },\\n \"usage\": \"Simple link scraper that scrapes the data from url and returns string\"\\n}\\n\\n**Task Context**:\\nwhat is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\n\\n\\n**History of Previous Actions**:\\nAction: Link Scraper{'url': 'https://en.wikipedia.org/wiki/Ethiopian_calendar'}\\nOutput: \\nAction: Link Scraper{'url': 'https://en.wikipedia.org/wiki/Ethiopian_calendar'}\\nOutput: \\n\\nYou should only select the actions specified in **Possible Actions**\\nYou should only respond in JSON format as described below without any extra text.\\nResponse Format:\\n{\\n \"command\": {\\n \"name\": \"tool/command name you choose\",\\n \"args\": {\\n \"arg name\": \"value\"\\n }\\n }\\n}\\nEnsure the response can be parsed by Python json.loads\\n\\nFollow the described format strictly.\\n\\n", "agent": "human"}], "output": "{\\n \"command\": {\\n \"name\": \"Link Scraper\",\\n \"args\": {\\n \"url\": \"https://en.wikipedia.org/wiki/Ethiopian_calendar\"\\n }\\n }\\n}", "llm_name": "gpt-3.5-turbo"} | ||
{"input": [{"text": "You are a **question answering assistant** who solves user questions and offers a detailed solution.\\n\\nYour name is QA Agent.\\n\\nContext: what is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\n\\n\\nAction - Result History:\\nAction: Link Scraper{'url': 'https://en.wikipedia.org/wiki/Ethiopian_calendar'}\\nOutput: \\nAction: Link Scraper{'url': 'https://en.wikipedia.org/wiki/Ethiopian_calendar'}\\nOutput: \\nAction: Link Scraper{'url': 'https://en.wikipedia.org/wiki/Ethiopian_calendar'}\\nOutput: \\n\\nGiven the context and the action-result history, please complete the task mentioned. Include any links you used from the context and history in the result.\\nTask: what is unique about Ethiopia calendar? based on this link https://en.wikipedia.org/wiki/Ethiopian_calendar\\nResult:\\n", "agent": "human"}], "output": "The Ethiopian calendar is unique in that it follows the ancient Coptic calendar, which is based on the Egyptian calendar. One of the main differences is that the Ethiopian calendar has 13 months, with 12 months having 30 days each and a 13th month called Pagum\u0113 having 5 or 6 days (depending on whether it's a leap year). This results in the Ethiopian calendar being roughly 7-8 years behind the Gregorian calendar. Additionally, the Ethiopian New Year falls on September 11 or 12 (depending on whether it's a leap year), which is different from the January 1st New Year in the Gregorian calendar.\\n\\nSource: [Wikipedia - Ethiopian calendar](https://en.wikipedia.org/wiki/Ethiopian_calendar)", "llm_name": "gpt-3.5-turbo"} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file is it requered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, Its auto generated