-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Specter cache #136
base: master
Are you sure you want to change the base?
Fix Specter cache #136
Conversation
@@ -116,6 +116,7 @@ def _maybe_print_to_console_and_file(self, | |||
paper_id = prediction_json['paper_id'] | |||
cache_key = paper_id + "_" + str(self._metadata[paper_id]['mdate']) | |||
self._redis_con.tensorset(key=cache_key, tensor=np.array(prediction_json['embedding'])) | |||
self._redis_con.expire(cache_key, 2629746) ## Expire after 1 month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we this in the config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it can be set in the model_params
of the config.json
but I don't think the models will have access to the Flask config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's keep it as it is
@@ -211,7 +212,6 @@ def set_archives_dataset(self, archives_dataset): | |||
"authors": [profile_id], | |||
"mdate": pub_mdate | |||
} | |||
self._remove_keys_from_cache(publication["id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is the call that is taking time? tensorset
is fast enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tensorset is fast but the _remove_keys_from_cache
function scans through all the keys to find matches which is quite slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much is slow? I'm still confused as to why redis is slow here. Are there too many keys in the database?
I'm not sure what the self._remove_keys_from_cache
method does, but its performance also depends on whether SCAN
or KEYS
is used.
This PR no longer attempts to find and remove all old instances of publications in the Redis cache, and instead sets an expiration date whenever inserting into Redis.