Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining which code snippet to show, if any #362

Open
gleitz opened this issue Apr 2, 2021 · 6 comments
Open

Determining which code snippet to show, if any #362

gleitz opened this issue Apr 2, 2021 · 6 comments

Comments

@gleitz
Copy link
Owner

gleitz commented Apr 2, 2021

Consider the following:

I think that it is true because the variable was set

vs.

class Error():
   pass

We would want to return the larger code block

  • How many newlines?
  • How many codeblocks?
  • Perhaps never return <code> and only go with pre?
  • Length of the answer
@ykskks
Copy link
Contributor

ykskks commented Sep 18, 2021

Hello, can you elaborate on this? I would like to work on this.

It seems to me pretty difficult to determine which coded blocks to show with rule-based approach.
Larger doesn't mean better. Sometime people give "lengthy and not efficient answer" and "the shorter and cleaner answer" at the same time. Returning the larger code block might result in people getting "bad" code.

If you have new ideas about this topic, I would gladly hear about it.

@gleitz
Copy link
Owner Author

gleitz commented Sep 18, 2021

Thanks for the help @ykskks. I'll explain the current logic and what we might do to improve it.

Today, we take the first answer and try to pull out any <pre> or <code> blocks. We prefer that order becausepre is usually for multi-line content (the class Error above) and code is for inline content (true in the example above). If we don't find any of those blocks we just return the entire answer.

I was thinking we could try and develop some heuristic of when to show the entire answer vs one (or more) of these pre or code blocks. The ideas above (how many code blocks the answer contains, the ratio of the length of the block to the overall number of characters in the answer, just dropping inline code blocks, etc) were a starting point of how we might build that logic.

For what it's worth, I don't think showing the whole answer is that bad. It's probably better than extracting some code block (true) and returning that answer because it makes no sense.

A list was started with some problem queries and some thoughts on how we might build the answer selection logic. This list is a little old, so some of the answers are not the same as what they used to be, but feel free to edit that or use it for inspiration.

@ykskks
Copy link
Contributor

ykskks commented Sep 19, 2021

Thank you for detailed answer!

I will collect more data from current version of howdoi and see how I can improve it.

Should I first share ideas with you on Issues or submit PR directly?

@gleitz
Copy link
Owner Author

gleitz commented Sep 19, 2021

Thanks! We can use this thread to post ideas - others can see it and help out too

@gleitz
Copy link
Owner Author

gleitz commented Sep 19, 2021

But when you're ready to submit a PR that's great too because I haven't really thought through how the "rules" should be applied in a clear and extensible way. So even if we don't have the perfect set of rules we can at least have the scaffolding to apply them as they grow and change.

@ykskks
Copy link
Contributor

ykskks commented Sep 20, 2021

I tried some queries myself and found that only showing the first pre is indeed not a great idea.

Rule-based approaches that you described might work, however it is hard to come up with them or actually evaluating them.
Ideally we could collect data and use ML (I am thinking simple models like Logistic Regression with texts and possible hand-crafted features) to predict which one to return.

However, it needs labeling work, which most people don't want to do.
So for now, below is my idea to somewhat mitigate the problem.

  1. only one pre exists → return pre
  2. 2 or more pre exist → return all pre with preceding sentence or paragraph
  3. only one code exists → sentence or paragraph with code
  4. 2 or more code exist → all sentences or paragraphs with code

This way, we can cut out unnecessary parts and still return meaningful contents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants