Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Specter cache #136

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fix Specter cache #136

wants to merge 3 commits into from

Conversation

haroldrubio
Copy link
Member

This PR no longer attempts to find and remove all old instances of publications in the Redis cache, and instead sets an expiration date whenever inserting into Redis.

@haroldrubio haroldrubio self-assigned this Nov 10, 2022
@@ -116,6 +116,7 @@ def _maybe_print_to_console_and_file(self,
paper_id = prediction_json['paper_id']
cache_key = paper_id + "_" + str(self._metadata[paper_id]['mdate'])
self._redis_con.tensorset(key=cache_key, tensor=np.array(prediction_json['embedding']))
self._redis_con.expire(cache_key, 2629746) ## Expire after 1 month
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we this in the config file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it can be set in the model_params of the config.json but I don't think the models will have access to the Flask config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's keep it as it is

@@ -211,7 +212,6 @@ def set_archives_dataset(self, archives_dataset):
"authors": [profile_id],
"mdate": pub_mdate
}
self._remove_keys_from_cache(publication["id"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is the call that is taking time? tensorset is fast enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tensorset is fast but the _remove_keys_from_cache function scans through all the keys to find matches which is quite slow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is slow? I'm still confused as to why redis is slow here. Are there too many keys in the database?

I'm not sure what the self._remove_keys_from_cache method does, but its performance also depends on whether SCAN or KEYS is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants