Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patterns deployment with hashes #1089

Closed
wants to merge 2 commits into from
Closed

Conversation

acelaya
Copy link
Contributor

@acelaya acelaya commented Jun 9, 2023

This PR introduces a number of changes with the intention to fix how Cloudflare caches the pattern library assets, trying to be as optimal as possible, and affecting building as little as possible.

  • The index.html entry point used both during dev and in production, is now a template file, where the references to JS and CSS bundles is set dynamically.
  • In order to resolve references to JS and CSS bundles we now use the generateManifest function from frontend-build, which brings the ability to have content hashes in URLs, without affecting the actual building of those assets.
  • The nginx.conf file used in production now includes instructions to make sure proper Cache-Control headers are exposed to Cloudflare. Thos instructions imply that:
    • CSS/JS bundles are cached for 1 year, since the URLs will contain different hashes when their content changes
    • Images are cached for 1 day. This is an arbitrary value picked by me that we can change if needed.
    • The index file is not cached, as it contains references to bundles.
  • Serving the pattern library locally now depends on bundling CSS and JS, and parsing the index template first, which results in some changes and new tasks being introduced in the gulpfile.

Testing steps

There are a couple of things that need to be checked for regressions, and new logic to be tested.

  • Dev server:
    • Delete all assets inside build folder.
    • Run make dev
    • Check that assets are built, and then dev server is started.
    • Visit http://localhost:4001 and confirm the patterns library site works.
    • Edit some file wait for the watch task to rebuild the bundle, and make sure the changes are reflected in http://localhost:4001
  • Prod site:
    • Build the docker image: docker build . -t hypothesis_frontend_shared.
    • Run the image on a port of your choice. For example, port 89: docker run --rm -p 89:5001 hypothesis_frontend_shared
    • Visit http://localhost:89, and check in the console's network tab, that it loaded the CSS and JS bundles. Check that all those contain the response headers: Cache-Control: public and Cache-Control: max-age=31536000.
    • If you refresh the page, you should see how it directly loads resources from cache (for the bundles). The bundle requests should also not show in nginx logs, as the browser did not make the requests.

This PR closes #1084


RUN rm -r /usr/share/nginx/html && rm /etc/nginx/conf.d/default.conf
# Copy pattern library static assets, and put them in nginx document root folder
COPY --from=builder /frontend-shared/build /usr/share/nginx/html
COPY ./templates/index.html /usr/share/nginx/html/index.html
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed, because the index file is now built and placed in the build directory, so the step above includes it.

try_files $uri $uri/ /index.html$is_args$args;
# The index file references CSS and JS bundles, so we need to make sure it's always up to date
expires -1;
try_files $uri $uri/ /index.html;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed $is_args$args, because those actually make no sense when loading a static file. They are meant to be used in fast-cgi contexts, like with PHP and such, where there might be some server-side logic run before nginx returns the result.

templateContent
);

fs.writeFileSync('build/index.html', processedTemplateContent);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered introducing mustache back, but the use case is so simple that it felt unnecessary, but I'm open to do it. No strong opinion here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two most important things IMO are:

  1. It is obvious what is/is not a variable in the template
  2. Appropriate escaping is applied

I think we can get away with fake templates for now, although it does mean that someone will have to know to change this if they decide to add other dynamic content in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although it does mean that someone will have to know to change this if they decide to add other dynamic content in future.

The main challenge here is mapping the manifest entries with their corresponding template variable, in a way that we don't couple the logic with the contents of the manifest itself (as it is above)

However, this is true even if we use a proper templating system. For example, with mustache this could look something like this:

const templateMap = [
  ['css_bundle', 'styles/pattern-library.css'],
  ['js_bundle', 'scripts/pattern-library.bundle.js'],
];
const manifest = await generateManifest({
  pattern: 'build/{scripts,styles}/pattern-library*.{css,js}',
});
const templateContent = fs
  .readFileSync('templates/index.template.html')
  .toString();
const viewParams = templateMap.reduce(
    (view, [pattern, fileName]) => {
      view[pattern] = manifest[fileName] ?? pattern
      return view;
    },
    {}
  );

const processedTemplateContent = Mustache.render(templateContent, viewParams);

fs.writeFileSync('build/index.html', processedTemplateContent);

The obvious benefit of a templating system is that the replacement of variables is less fragile (extra/missing spaces and such).

@@ -4,10 +4,10 @@
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>UI component pattern library</title>
<link rel="stylesheet" href="/styles/pattern-library.css">
<link rel="stylesheet" href="/{{ css_bundle }}">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the whole path should be a variable, including the leading /. If the file was moved for example, that part might need to change.

@robertknight
Copy link
Member

robertknight commented Jun 12, 2023

If using query-string cache busters like h-assets, there are a couple of issues to be aware of:

  1. If the asset is fetched for any reason without specifying the cache buster, you don't want to set a super-long-lived cache header.

  2. There is a risk that after a deployment, a browser requests an old version of an asset, but gets the new version.

    1. Browser fetches page which contains links to asset version X
    2. Site is updated with asset version Y
    3. Browser fetches asset version X links from step 1
    4. Origin site ignores the cache buster and serves asset version Y
    5. Cloudflare now caches asset version Y under the key for asset version X

    This problem applies with many other things as well, eg. if API changes or resource URLs are changed in a new deployment. This has not been a major problem in practice, but h-assets does mitigate it partially by returning a 404 if the cache buster is present but incorrect. At least this makes it more obvious that something is wrong, and avoids confusion for developers.

In h-assets we mitigate issue (1) and also other potential problems with caching by not setting any custom cache headers instead just leaving Cloudflare to use its default value of 30 minutes. This "solution" also limits how long issue (2) can last for. I would be inclined to do the same here. The main function of the hashes is to function as a cache buster after deployment.

In the Hypothesis client, the asset URLs do have query-string cache busters, but that is just an accident of implementation as asset URLs include the client version. We also keep the assets for every client release around indefinitely and we don't make braking API changes to stable Hypothesis APIs. These effectively avoid both issues (1) and (2).

@acelaya
Copy link
Contributor Author

acelaya commented Jun 12, 2023

Ugh, you are right. Those two points are a problem.

I implemented this with the wrong assumption that the hash in the query string would be equivalent to having it in the file name, but it clearly isn't.

I think I'm going to park this for now, as I don't know how much time is reasonable to spend on this.

@acelaya acelaya closed this Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Cloudflare cache handling in patterns deployment
2 participants