Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better sorting of completions #1165

Closed
wants to merge 11 commits into from
Closed

Conversation

schmeic
Copy link

@schmeic schmeic commented Jun 13, 2024

Description

Sort completions by match point, then match length, then item text.

Note that match length is equal to the length of the search text for non fuzzy matches, and equal to the length of the match group for fuzzy matches.

This means that matches that start with the search text will be displayed first, followed by matches that contain the search text, and then fuzzy matches.

Sorting by item text when both match point and match length are the same for matched completions ensures that completions that start with the search text will be in alphabetical order. For example, searching for fr will display FROM before FROM_UNIXTIME.

Checklist

  • I've added this contribution to the changelog.md.
  • I've added my name to the AUTHORS file (or it's already there).

@rolandwalker
Copy link
Contributor

Wow, you weren't joking about the tests.

From the English description, the algorithm sounds great.

I may have time to experiment with this on the weekend.

@schmeic
Copy link
Author

schmeic commented Jun 13, 2024

Yeah, that's why I wanted to make sure the algorithm would be accepted before working on the tests. From my experiments with this new sorting, I've found that it has always put what I consider the most relevant results at the top.

The funny thing was that the find_matches(...) function was already storing all 3 of the parameters I used for sorting, and then ignoring everything but the item text at the end when creating the completions generator. That's what gave me the idea for the sorting algorithm.

@rolandwalker
Copy link
Contributor

Let's let the PR hang around for some days and see if it attracts any objections.

@schmeic
Copy link
Author

schmeic commented Jun 13, 2024

Ok. Maybe you'll have some time to experiment with it as well. I'm hoping it will attract some "likes"...

@rolandwalker
Copy link
Contributor

Should the proposed change only apply when if not smart_completion: ? It would see that the point of smart_completion is to give categories in category order.


def sorted_completions(matches):
# sort by match point, then match length, then item text
matches = sorted(matches, key=lambda m: (m[1], m[0], m[2].lower()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the sort is casefolding it might as well also do

m[2].lower().strip('`')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can certainly add that. Pretty sure I've seen someone else complain in another issue about the back tick messing with sorting.

@schmeic
Copy link
Author

schmeic commented Jun 13, 2024

I don't think so, I personally would rather have a completion at the top that best matches my search text regardless of the category. Have to tried it out yet? Maybe you will agree after some experimenting.

I actually first tried doing the same sort in the find_matches(...) function, so per category, and the results were not ideal.

@rolandwalker
Copy link
Contributor

I'm trying it now. It would seem that the config file lies when it claims

# Enables context sensitive auto-completion. If this is disabled then all
# possible completions will be listed.
smart_completion = True

because when set to False, table names and such are simply not returned.

It is rather helpful to see the completions in the "smart" ordering when nothing has yet been typed, like maybe

            if len(word_before_cursor) != 0 and word_before_cursor[-1] != '(':
                matches = sorted(matches, key=lambda m: (m[1], m[0], m[2].lower().strip('`')))

@schmeic schmeic closed this Jun 13, 2024
@schmeic
Copy link
Author

schmeic commented Jun 13, 2024

Regarding the smart_completion = False setting, I think what you're seeing is a bug. You need to first do:

use <db name>

and then you should see table and column names. It doesn't work if you use a DSN to select the db.

By the way, I think it would be really useful to do fuzzy matching when not doing smart completion. Maybe this can be implemented as a setting?

@rolandwalker
Copy link
Contributor

Why closed?

@schmeic
Copy link
Author

schmeic commented Jun 13, 2024

Sorry, didn't realize I did that. I deleted a comment and maybe that closed it?

How can I reopen?

@schmeic schmeic reopened this Jun 13, 2024
@schmeic
Copy link
Author

schmeic commented Jun 13, 2024

Accidentally closed.

@rolandwalker
Copy link
Contributor

Regarding the smart_completion = False setting, I think what you're seeing is a bug. You need to first do:
use
and then you should see table and column names. It doesn't work if you use a DSN to select the db.

Your suggestion works! But we really need to fix that bug.

By the way, I think it would be really useful to do fuzzy matching when not doing smart completion. Maybe this can be implemented as a setting?

Yes, though one issue is that I adore settings, and @amjith really does not. For example for the entire functionality in this PR, I think it would be fantastic if ~/.myclirc added

completion_aggresive_sort = True

or any other property name that makes sense.

I personally would rather have a completion at the top that best matches my search text regardless of the category.

Right. I personally would also prefer to use your algorithm. But we have to think of the most general usecase and the default expectation. That is why I proposed to enable it outside of the smart_completion codepath.

@amjith
Copy link
Member

amjith commented Jun 14, 2024

Let me catch up on the PR.

I'm not opposed to adding a config option but if the new completion behavior feels logical we can switch to it by default. Let me read through the proposal and try it out locally.

@schmeic
Copy link
Author

schmeic commented Jun 14, 2024

I think this new sort is much better than the current default, so it seems like most people would prefer it. By the way, I just upgraded to the latest pip version of 1.27.2, and the sorting is now much worse than before. Now it appears to be returning results in alphabetical order, where as before the order was more by relevance (previous version I used was 1.26.1).

So now in 1.27.2, when I do this:
SELECT * FROM e
I see a bunch of tables starting with a at the top of the results. Previously, the top results were tables starting with e.

Also, there are other open issues I've seen where people complain about the fuzzy matching and want an option to turn it off because the fuzzy matches are often at the top of the results. My algorithm will also fix this since the fuzzy matches will always be below exact matches.

@amjith
Copy link
Member

amjith commented Jun 15, 2024

I'm still testing things out. Notes so far, I like that FROM is listed about FROM_UNIXTIME. There is a crash that happens when I type SOURCE . Haven't dug into it yet, if you have the time please take a look.

@schmeic
Copy link
Author

schmeic commented Jun 17, 2024

I'm still testing things out. Notes so far, I like that FROM is listed about FROM_UNIXTIME. There is a crash that happens when I type SOURCE . Haven't dug into it yet, if you have the time please take a look.

Ok, I see the issue. The find_files() method is returning a Completion generator and the code is expecting an array of match tuples.

Any idea what happened to the sorting in the latest version (1.27.1) ? Why is it now alphabetical?

@schmeic
Copy link
Author

schmeic commented Jun 27, 2024

Still waiting on feedback for this one...

@amjith
Copy link
Member

amjith commented Jun 27, 2024

I've tested it locally and I'm fine with the ordering. Happy to merge if you can fix the tests.

@schmeic
Copy link
Author

schmeic commented Jun 27, 2024

Cool, I'll work on fixing the tests.

@schmeic
Copy link
Author

schmeic commented Jul 9, 2024

Is there a way to run tests individually? I'm having a bit of trouble understanding the test setup.

@amjith
Copy link
Member

amjith commented Jul 9, 2024

We just use pytest. You can run pytest -k <name of the test> to run just a single test.

For example:

pytest -k test_suggested_multiple_column_names_with_dot will run only that one single test.

You can also specify partial names to select multiple tests.

pytest -k test_suggested will select all the tests that start with that name.

@schmeic
Copy link
Author

schmeic commented Jul 10, 2024

Cool, thanks! I don't use python that much, so this is all pretty new to me. First test fixed...

@schmeic
Copy link
Author

schmeic commented Aug 7, 2024

I've fixed all of the unit tests, but when I pushed my changes, I get an email saying the "PR run failed ...".

How do I fix this?

@schmeic
Copy link
Author

schmeic commented Aug 19, 2024

Looks like the error is related to the test command, which is not directly related to this pull request.

"The test command is disabled and references to it are deprecated."

What's the status of this pull request?

@rolandwalker
Copy link
Contributor

What happens if you change the line setuptools in requirements-dev.txt to setuptools<=71.1.0 ?

@schmeic
Copy link
Author

schmeic commented Aug 20, 2024

Better, but why are most of these checks getting cancelled? Also, the one that failed had no errors:

8 features passed, 0 failed, 0 skipped
32 scenarios passed, 0 failed, 0 skipped
139 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m16.305s
Error: Process completed with exit code 1.

@rolandwalker
Copy link
Contributor

First pytest is run, then behave is run. Then the CI run exits with a failure if either one failed. In this case, pytest is showing failures:

test/test_clistyle.py ss                                                 [  0%]
test/test_completion_engine.py ......................................... [ 15%]
...................x...................................................  [ 41%]
test/test_completion_refresher.py ....                                   [ 42%]
test/test_config.py ..............                                       [ 47%]
test/test_dbspecial.py ....                                              [ 49%]
test/test_main.py ...................                                    [ 56%]
test/test_naive_completion.py .....                                      [ 57%]
test/test_parseutils.py .............X........................           [ 71%]
test/test_prompt_utils.py .                                              [ 72%]
test/test_smart_completion_public_schema_only.py .................F....  [ 80%]
test/test_special_iocommands.py ...................                      [ 86%]
test/test_sqlexecute.py ...................................              [ 99%]
test/test_tabular_output.py F                                            [100%]

I believe the other CI runs in the matrix were canceled as soon as one failed.

@schmeic
Copy link
Author

schmeic commented Aug 21, 2024

I realized that the reason one of the tests is failing is because my sort algorithm is comparing lower cased matches, which doesn't work for the special commands starting with \, since these are case sensitive. How should I handle this?

I feel like there shouldn't even be empty string completion, especially since it isn't really empty string, but rather blank string and requires entering something like a space to trigger it.

It's also strange that when I run ./setup.py test I get an error, but when I run the following command, it passes:
pytest -vv -k test_empty_string_completion test/test_smart_completion_public_schema_only.py

@schmeic
Copy link
Author

schmeic commented Sep 12, 2024

Still waiting to hear back on how to handle this one failing unit test. Doesn't seem worth complicating the sort method just to support these special commands, and fix a single unit test.

@schmeic
Copy link
Author

schmeic commented Oct 9, 2024

Pretty disappointed not to get any feedback on this, I spent a lot of time trying to fix all these unit tests and I think it's really close.

I wanted to contribute to this project, but I guess I can just use my own branch.

@amjith
Copy link
Member

amjith commented Nov 3, 2024

Sorry @schmeic. Life got ridiculous for the past few months. I'm taking a look now.

def sorted_completions(matches):
# sort by match point, then match length, then item text
matches = sorted(list(matches), key=lambda m:
(m[1], m[0], m[2].lower().strip('`'), m[2].startswith('`')))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this strip is causing the tests to fail. The backticks are used to escape the table name if it happens to be a keyword. For instance if someone names the table table then it going to conflict with the keyword since table is a keyword. This adds backticks around it to escape the string to avoid the conflict.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment by @rolandwalker on June 13. The idea is that a table/field surrounded by backticks should match the same one without backticks. I've tried the change the tests to work with this.

@amjith
Copy link
Member

amjith commented Nov 3, 2024

I noticed that the sort order seems to have messed up the ordering when it completes column names in a table.

This is the completion from your branch. The columns are not listed at the top but instead pushed down below the keywords:
Cursor_and_0_0_python3_11_-mycli-u_root__Users_amjith_Dropbox_code_python_mycli

This is the completion order from the main branch, the columns from the table are listed at the top.

Cursor_and_0_0_python3_11_-mycli-u_root__Users_amjith_Dropbox_code_python_mycli

@amjith
Copy link
Member

amjith commented Nov 4, 2024

@schmeic I went down a rabbit hole here. The reason we don't sort the completion candidates is because we have the keywords listed in a curated order that is based on popularity instead of lexicography.

But I went back to your original motivation for this PR. When you type SELECT * FR it should list FROM instead of FROM_UNIXTIMESTAMP. I modified the completion engine a little bit to allow only keywords (not functions) which fixes the issue superficially.

I'll open a new PR and tag you on it so you could take it for a spin and let me know what you think.

@amjith
Copy link
Member

amjith commented Nov 4, 2024

#1175

@schmeic
Copy link
Author

schmeic commented Nov 4, 2024

@schmeic I went down a rabbit hole here. The reason we don't sort the completion candidates is because we have the keywords listed in a curated order that is based on popularity instead of lexicography.

But I went back to your original motivation for this PR. When you type SELECT * FR it should list FROM instead of FROM_UNIXTIMESTAMP. I modified the completion engine a little bit to allow only keywords (not functions) which fixes the issue superficially.

I'll open a new PR and tag you on it so you could take it for a spin and let me know what you think.

The FROM coming after FROM_UNIXTIMESTAMP was just one example of what I thought was wrong with the completions. Why should columns be at the top? There are plenty of times when I'm looking for a keyword, and don't want to see all those unsorted column names at the top. I think the completions are a lot more intuitive when everything is sorted and works the same.

@amjith
Copy link
Member

amjith commented Nov 5, 2024

The FROM coming after FROM_UNIXTIMESTAMP was just one example of what I thought was wrong with the completions.

I understand that. That's why I said my PR is a superficial fix.

Why should columns be at the top?

Imagine this scenario:
You just connected to a fresh DB and you don't know the tables or the columns in each table. You start typing SELECT * FROM and you get the list of tables at the top so you navigate using arrow keys to get an idea of the tables. Then you pick the table from the suggestion and continue with your query SELECT * FROM users WHERE , you don't know the columns in the table but after the WHERE clause you get a list of column names from the users table. You can find all the columns using the arrow keys and pick the one that makes the most sense and so on. If we take the approach you've suggested the user has to know the columns in the table in order to start typing a few letters to get the desired completion. It thwarts the ability discover the schema.

I think the current implementation is a more human centric approach with UX. In fact, the keywords are ordered (I think I mentioned this in my previous comment as well) to make sure the most common keywords are listed at the top. This is also meant to facilitate a user to pick things from the completion menu rather than continue to type to get to their desired keyword.

My goal was to ensure we can provide a user friendly UX for people who pick from a menu without compromising the efficiency of someone who likes to type (or knows the schema full well). But you pointed out a very valid flaw with the FROM and FROM_UNIXTIMESTAMP that was annoying for a user who prefers to type rather than use arrows. So I decided to fix that tactically.

I hope my counterpoint convinces you of the reason for sticking with the current approach. As always I'm open to revisiting the design if there are compelling reasons to do so.

@schmeic
Copy link
Author

schmeic commented Nov 5, 2024

Ok, I guess I see your point about "discovering the schema", but I would argue that if you want to do that, use:
DESC <table name>

Once you start typing any characters, the current approach will typically provide worse results than my approach, especially when you want a keyword instead of a column. So, I still feel like my approach is better, since it will provide more relevant results the majority of the time. It seems pretty inefficient to try and discover the schema using autocomplete.

As far as your point about keyword ordering, they may be listed with more common ones at the top, but what if I'm looking for a less common one? My approach will find it much more efficiently. However, I think the keyword search should be doing fuzzy searching when using my approach. Currently, when I search for WHERE, I need to type the first 4 chars to get past WHEN. If we enable fuzzy searching, I would be able to just type wr.

@amjith
Copy link
Member

amjith commented Nov 7, 2024

Ok, I guess I see your point about "discovering the schema", but I would argue that if you want to do that, use:
DESC

The whole point of mycli is to not break your flow. When I am typing a query I don't want to stop in the middle and run a DESC table to continue my query. This becomes especially relevant when writing JOIN queries.

Adding fuzzy matching for keywords is not a bad idea. Let me see if I can prototype that in a PR.

@amjith
Copy link
Member

amjith commented Nov 7, 2024

I've updated #1175 to enable fuzzy matching for keywords. Can you give it a try?

You can install it as follows:

pip install -U https://github.com/dbcli/mycli/archive/refs/heads/fix-select-star.zip

@schmeic
Copy link
Author

schmeic commented Nov 7, 2024

Sounds like you prefer good discovery over relevant results when text is typed. I want to find the completion I'm looking for with the fewest keystrokes. I'm not sure if it's possible to have both, let me know if you have any ideas on how to make it work.

I actually added fuzzy keyword matching on my branch, it works better.

@amjith
Copy link
Member

amjith commented Nov 10, 2024

Sounds like you prefer good discovery over relevant results when text is typed.

I'm not doing a good job of explaining myself. I value both I think we can achieve the outcome you're looking for without compromising discoverability. Let me explain it slightly differently. I have a users table with columns that have similar names user_name and user_name_normalized.

When I type SELECT * FROM users WHERE u I want to see user_name on the top and user_name_normalized at the bottom because the first one is shorter and if the user wants to select the second one they can type the suffix that is unique to the longer name (like zed).

But what I don't want to see is SELECT * FROM users WHERE u listing the functions ucase and unix_timestamp before listing the column names. Because that interrupts my flow to abandon my query and go run a DESC users to find the column names.

I think enabling fuzzy on keywords has fixed the root of your complaint (listing FROM_UNIXTIME before FROM). For that I'm grateful to you for proposing a solution and forcing me to think hard about the issue. But unfortunately I don't agree that we should sort all the entries without prioritization. That hampers discoverability.

I am sorry this PR is not getting merged but I want to thank you for prompting me to do better with the UX.

@amjith amjith closed this Nov 10, 2024
@schmeic
Copy link
Author

schmeic commented Nov 12, 2024

Just to be clear, the root of my complaint was not listing FROM_UNIXTIME before FROM, that was just one example of what I considered to be undesirable completion results.

I want to move the completion term that I'm looking for to the top with the minimum amount of keystrokes, and prioritizing column names over other completions hinders that.

I wish we could have had this discussion before you told me to fix all the unit tests and you would accept the pull request. I wasted a lot of time working on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants