Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved caching and autocomplete known package names. #16

Open
nickl- opened this issue Oct 20, 2012 · 5 comments
Open

Improved caching and autocomplete known package names. #16

nickl- opened this issue Oct 20, 2012 · 5 comments
Milestone

Comments

@nickl-
Copy link
Member

nickl- commented Oct 20, 2012

Amongst the known issues on the develop branch is the fact that there is not suitable redundancy on the cache storage to enable searching from multiple package managers.

The results are currently stored by package name which means that once you have searched for abc that is what you will get for abc. Affects the info and search functionality as well.

Work around is to supply -i or --invalidate-cache options to start fresh.

Considerations are required when resolving these shortcomings to also provide the stored information in such a fashion that it my be used to facilitate autocompleting package names.

@nickl-
Copy link
Member Author

nickl- commented Oct 23, 2012

Should we consider using memoize instead of beaker or do we completely replace the cache with a sqlite db instead.

The latter will allow us to query the results which will give us tho opportunity to return the same but filtered results if for example you searched for a package through all managers and then decide to prefix the manager and only look at the results for pip for example which we can't do with key based cache as easily.

With cache we would generally attach some data to all a query currently (which is lacking and the reason for this issue) we simply attach the results to the package and would probably combine the command + [manager] + package which will fix the problems but then we are not able to use the results for anything else then the specific request.

Instead we could alternatively store the information in DB with indexes on package_name, package_maneger_name, version, description, maybe even homepage and author lookups as well as the linked detailed information.

Something like this perhaps

                  +-----------+     +----------------+
                  | search    |     | search_package |
                  |-----------|     |----------------|
                  | pk        |-----| search_id      |
                  | package   |     | package_id     |
                  | manager   |     +----------------+
                  | timestamp |         \|/
                  +-----------+          |    +---------------+
                                         |    |               |
                  +---------+         +---------+             |
                  | author  |         | package |            /|\
                  |---------|         |---------|      +------------+
                  | pk      |         | pk      |      | version    |
                  | author  |   +-----| package |      |------------|
                  | email   |   |     | manager |      | pk         |
                  +---------+   |     | descr   |      | package_id |
                       |        |     | url     |      | version    |
                      /|\      /|\    | info    |      | timestamp  |
                  +----------------+  +---+-----+      +----------+-+
                  | package_author |      |   +---------------+   |
                  +----------------|      |   | rating        |   |
                  | package_id     |      |   |---------------|   |
                  | author_id      |      |   | pk            |   |
                  +----------------≠      |   | package_id    |   |
                                          |   | version_id    |   |
                                          +-->| user_rating   |   |
                                              | release_freq  |<--+
                                              | searches      |
                                              | installations |
                                              +---------------+

@nickl-
Copy link
Member Author

nickl- commented Oct 23, 2012

Damn that looks crappy!

This is more what we were after actually

aero erd

@nickl-
Copy link
Member Author

nickl- commented Oct 24, 2012

Ahh wicked here we can edit the ERD on asciiflow

Now if the only generated me an image of that which I could link to back here...

@jaysonsantos
Copy link
Member

What about instead of caching what user search, index all packages in all managers?
Would It be good?

@nickl-
Copy link
Member Author

nickl- commented Oct 25, 2012

@jaysonsantos while I was doing the db design I was thinking the same which looking now is rather obvious heheh glad we are on the same wave length. =)

I was thinking that we start the pracesses for searcting as the user requesed while quickly seeing what we already have for them and updating the results as the up to date information becomes available. Maybe p2p makes sense too sync with the datasets on your network ar other udp requests.

As you can see I was concentrating on being flexible but not storing useless information. We only need to know about the detals of the latest packages for example or if your e-mal has changed. we don't, the dates and versions but the rest can update and overwrite the legacy cruft. knowing what and when youve searhed may be useful too when deciding what to go look for.

But yes, I think we need to loosen the threads and start thinking parallel processing. What about alternatvie sources damn did I battle to find this now but here it is: http://freecode.com/ and there are others that are similar.

If we could work out what it is you are actually looking for and give you the best ranked software. Without asking anything from you like to vote etc, just from information we already know.

We need to find a clear focus here, what are we really trying to accomplish. I am already using aero as my no 1 tool and the fact that I get similar information and use the same functions is why I go to it. Before I was thinking things like we should install package managers and instal this and that, I don't think we are doing much about installing though.

What are you thinking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants