Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for instructables #1392

Closed
wants to merge 1 commit into from
Closed

Conversation

jknndy
Copy link
Collaborator

@jknndy jknndy commented Nov 18, 2024

7/7 from #1384

not sure we'll be able to support this site. Seems like the format is user determined which would make it difficult to keep up.

@jknndy jknndy marked this pull request as ready for review November 18, 2024 01:22
return [normalize_string(tag.get_text()) for tag in ingredients_tags]

def instructions(self):
steps = self.soup.find_all("section", class_="step")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to exclude/filter-out the ingredients (supplies) step, otherwise we're duplicating it in the instructions.

Comment on lines +42 to +44
instructions.append(
f"{normalize_string(step_title.get_text())}: {normalize_string(step_body.get_text())}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this scraper, since the Step n: ... text is somewhat descriptive (not simply step 1, step 2, ...), I think that the results might be more comprehensible with separate instruction entries for those.

e.g. instead of ["Step 1: Preparation: do some things", "Step 2: Begin cooking: some more things"], I would suggest `["Step 1: Preparation", "Do some things", "Step 2: Begin cooking", "some more things"].

The way I think about this is that I imagine the text of the instructions being spoken aloud by cooking assistant software that allows scrolling up/down by pressing the arrow keys.

I'd find it annoying/distracting to scroll up and hear step 3, because that wouldn't tell me anything about what the step is or whether I'd found the part of the instructions I'm looking for. In contrast, step 3: combine the sauces could be useful.

@jayaddison
Copy link
Collaborator

not sure we'll be able to support this site. Seems like the format is user determined which would make it difficult to keep up.

Ah, hm. So different recipes could have significantly different layouts?

@jknndy
Copy link
Collaborator Author

jknndy commented Nov 19, 2024

Unfortunately yes, It seems like maybe the pages are generated via a user created format. I checked a dozen or so recipes and it seems to vary across most.

@@ -0,0 +1,20 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm.. the ingredients are inside the instructions. Not sure what to do about this. That means that if you read the entire instructions, you'd know the entire recipe -- but users of this library might not begin by accessing the (entire) instructions.

Not sure how to proceed here yet.

@jayaddison
Copy link
Collaborator

@hhursev any thoughts about how to deal with less-structured recipe websites? I can think of reasons to try to include them (inclusiveness, best-effort provision of content for users, pushing ourselves to improve retrieval accuracy) and also reasons not to (potential for low-quality results, difficult support burden to debug/repair, less clarity on what is acceptable to include in the library).. so.. I'm not sure yet.

@hhursev
Copy link
Owner

hhursev commented Dec 7, 2024

Yeaah that website is all over the place.. I'd say we decline a support for it. No point to bother with maintaining inconsistent unstructured HTML format given how many other well-formatted wonderful recipe sites we support.

As you've said there are pros and cons but more often than not our efforts should be geared towards providing reliable scraping. While we aim to be inclusive, we need to balance that with the quality and reliability of our library.

@jknndy jknndy closed this Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants