-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for instructables #1392
Conversation
return [normalize_string(tag.get_text()) for tag in ingredients_tags] | ||
|
||
def instructions(self): | ||
steps = self.soup.find_all("section", class_="step") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to exclude/filter-out the ingredients
(supplies) step, otherwise we're duplicating it in the instructions.
instructions.append( | ||
f"{normalize_string(step_title.get_text())}: {normalize_string(step_body.get_text())}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this scraper, since the Step n: ...
text is somewhat descriptive (not simply step 1
, step 2
, ...), I think that the results might be more comprehensible with separate instruction entries for those.
e.g. instead of ["Step 1: Preparation: do some things", "Step 2: Begin cooking: some more things"]
, I would suggest `["Step 1: Preparation", "Do some things", "Step 2: Begin cooking", "some more things"].
The way I think about this is that I imagine the text of the instructions being spoken aloud by cooking assistant software that allows scrolling up/down by pressing the arrow keys.
I'd find it annoying/distracting to scroll up and hear step 3
, because that wouldn't tell me anything about what the step is or whether I'd found the part of the instructions I'm looking for. In contrast, step 3: combine the sauces
could be useful.
Ah, hm. So different recipes could have significantly different layouts? |
Unfortunately yes, It seems like maybe the pages are generated via a user created format. I checked a dozen or so recipes and it seems to vary across most. |
@@ -0,0 +1,20 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm.. the ingredients are inside the instructions
. Not sure what to do about this. That means that if you read the entire instructions, you'd know the entire recipe -- but users of this library might not begin by accessing the (entire) instructions
.
Not sure how to proceed here yet.
@hhursev any thoughts about how to deal with less-structured recipe websites? I can think of reasons to try to include them (inclusiveness, best-effort provision of content for users, pushing ourselves to improve retrieval accuracy) and also reasons not to (potential for low-quality results, difficult support burden to debug/repair, less clarity on what is acceptable to include in the library).. so.. I'm not sure yet. |
Yeaah that website is all over the place.. I'd say we decline a support for it. No point to bother with maintaining inconsistent unstructured HTML format given how many other well-formatted wonderful recipe sites we support. As you've said there are pros and cons but more often than not our efforts should be geared towards providing reliable scraping. While we aim to be inclusive, we need to balance that with the quality and reliability of our library. |
7/7 from #1384
not sure we'll be able to support this site. Seems like the format is user determined which would make it difficult to keep up.