Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage with Wagtail Personalisation #17

Open
drcongo opened this issue Jan 17, 2020 · 6 comments
Open

Usage with Wagtail Personalisation #17

drcongo opened this issue Jan 17, 2020 · 6 comments

Comments

@drcongo
Copy link
Contributor

drcongo commented Jan 17, 2020

Hello. I'm using Wagtail-Cache on a site that also uses Wagtail-Personalisation which allows different users to see different versions of pages based on segmentation. Obviously I need to cache the segmented versions of the page and serve the correct one to each user, but what I'm getting is whichever version of a page was first matched saved in the cache and then every user seeing that version.

Given that Wagtail-Cache uses vary on cookie, I figured that I should be able to store the segmentation rules that a user matches in a cookie and each page version would then get cached separately, but unfortunately it's still serving the first requested version. I don't know Django's cache mechanisms particularly well, do you know if there are some rules around which cookies the vary checks against?

Plan B is to try overriding Wagtail-Cache's middleware to add checks against segmentation in there but I'd like to keep a clean Wagtail-Cache install if possible.

Thanks, and sorry it's more of a question than an issue.

edit: I've also tried adding a custom x-segmentation header and using patch_vary_headers to add that to the vary list. I'm starting to think this is all just down to the order that things are happening. If I find a solution I'll report back.

@vsalvino
Copy link
Contributor

This is an important use-case you raised, and it would make sense for wagtail-cache to support this out of the box.

Regarding the vary headers, it only ignores the cache in this case if the request did not have a cookie, but the response set a cookie. This part could probably use some improvements.

The relevant code that decides to set the cache for a response is here. If you can think of a way to modify it to work better with wagtail personalization that would be great. I know WordPress uses some special cookie names which generally if present means the page should not be cached. We could probably do something similar here. You could of course test this out by making a hook in your project is_response_cacheable (which will fully override wagtail-cache's decision).

# Check if the response is cacheable
# Don't cache private or no-cache responses.
# Do cache 200, 301, 302, 304, and 404 codes so that wagtail doesn't
# have to repeatedly look up these URLs in the database.
# Don't cache streaming responses.
is_cacheable = \
'no-cache' not in response.get('Cache-Control', ()) and \
'private' not in response.get('Cache-Control', ()) and \
response.status_code in (200, 301, 302, 304, 404) and \
not response.streaming
# Don't cache 200 responses that set a user-specific cookie in response to
# a cookie-less request (e.g. CSRF tokens).
if is_cacheable and response.status_code == 200:
is_cacheable = not (
not request.COOKIES and response.cookies and has_vary_header(response, 'Cookie')
)

The actual cache key which identifies the entry in the cache is a bit more opaque because it is based on django internals django.utils.cache.learn_cache_key and django.utils.cache.get_cache_key. I have been thinking of replacing this with our own mechanism for generating cache keys, which will probably be necessary at some point to support more edge cases.

@drcongo
Copy link
Contributor Author

drcongo commented Jan 17, 2020

Thanks @vsalvino - If I fork and have a poke at adding some kind of configurability to vary headers do you think that might be a good way to add support for these kinds of use cases without tying it specifically to Wagtail-Personalisation?

This was the middleware I added to try to get it to work, but it didn't seem to succeed no matter where I placed it in the MIDDLEWARE settings.

class SegmentCookieMiddleware(MiddlewareMixin):

    """Sets segmentation matches in a cookie so that Wagtail Cache
    returns different results for different cookie settings.
    """

    def process_response(self, request, response):
        patch_vary_headers(response, ['x-segmentation'])
        try:
            user_segs = []
            segments = request.session.get('segments', [])
            if len(segments):
                for segment in segments:
                    user_segs.append(segment.get('encoded_name', ''))

            seg_str = '_'.join(user_segs)
            hashed = md5(seg_str.encode('utf-8')).hexdigest()
            response.set_cookie('segmentation', hashed)
            response['x-segmentation'] = hashed
        except Exception:
            pass  # Fail silently

        return response

@vsalvino
Copy link
Contributor

Yes - I would recommend forking and then:

  1. add some debug/print statements around django.utils.cache.learn_cache_key and django.utils.cache.get_cache_key to see how it is keying the cache. It is highly possible that those are just too naive to key the cache including specific headers etc. Hashing the cache key based on response headers would probably be smart if the built-in django utils are not already doing that.

  2. Play with identifying vary headers better in those lines I highlighted above.

I have a little code snippet here which has also been helpful for me in the past to debug the cache. Essentially you create another cache entry to save a list of all cache keys so you can see them #2 (comment)

@drcongo
Copy link
Contributor Author

drcongo commented Jan 24, 2020

Update: No matter what I tried, I couldn't get Django's cache key functions to take my custom headers into account. However, I have found an elegant / hacky (delete as applicable) solution to the original problem, which is to use the cache prefix setting, which was previously always None in Wagtail-Cache. So, I've added the ability to a) set a key_prefix as a new setting WAGTAIL_CACHE_PREFIX, and b) to have that be optionally a callable. What this means is in the project I'm working on with Wagtail-Personalisation I've been able to do this...

def gen_cache_prefix(request):
    seg_str = None
    try:
        user_segs = []
        segments = request.session.get('segments', [])
        if len(segments):
            for segment in segments:
                user_segs.append(segment.get('encoded_name', ''))

        seg_str = '_'.join(user_segs)
    except Exception:
        pass  # Fail silently

    return seg_str

WAGTAIL_CACHE_PREFIX = gen_cache_prefix

which generates a cache key prefix taking the user segmentation into account. This seems to work perfectly.

I don't know if doing this conflicts with future plans for the prefix functionality though, so happy to run this in our fork if so. If not, and you feel like this could be useful to merge, let me know and I'll add documentation and put in a pull request.

@vsalvino
Copy link
Contributor

The behavior your describe with the django cache keys is kind of what I expected. Similar to how we had to write our own middleware, I think we will need to write our own keying mechanism to support these types of use-cases.

Your use of a prefix is a clever workaround. I wouldn't be opposed merging it in as a possibly useful feature. But I think the real "fix" is to improve the keys and possibly make that user-configurable to determine what headers/cookies/etc. will be considered by wagtail-cache. This would be especially relevant for sites that do internationalization based on browser language rather than unique URLs. It would also improve reliability of handling CSRF tokens.

I would like to get some feedback from other wagtail-cache users - will try posting in the slack.

@vsalvino
Copy link
Contributor

vsalvino commented Jul 12, 2022

Following up on this. If you're using a custom header, then you will want to patch your response object with Vary: Cookie, <header-name>. For example, if your header is X-AB-Test, then it should be Vary: Cookie, X-AB-Test.

With that setup, the caching middleware should in fact take your header into account and cache separate versions of those pages. You can set this on the response itself from the page/view/etc. No need to do this in middleware. The middleware should preserve it and cache appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants