This repository has been archived by the owner on Jun 28, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
static-checks: Try multiple user agents
Make the URL checker cycle through a list of user agent values until we hit one the remote server is happy with. This is required since, unfortunately, we really, really want to check these URLs, but some sites block clients based on their `User-Agent` (UA) request header value. And of course, each site is different and can change its behaviour at any time. Our strategy therefore is to try various UA's until we find one the server accepts: - No explicit UA (use `curl`'s default) - Explicitly no UA. - A blank UA. - Partial UA values for various CLI tools. - Partial UA values for various console web browsers. - Partial UA for Emacs's built-in browser. - The existing UA which is used as a "last ditch" attempt where the UA implies multiple platforms and browser. > **Notes:** > > - The "partial UA" values specify specify the UA "product" but not the > UA "product version": we specify `foo` and not `foo/1.2.3`). We do > this since most sites tested appear to not care about the version. > This is as expected given that the version is strictly optional (see `[*]`). > > - We now treat URLs that the server reports as HTTP 401, HTTP 402 or > HTTP 403 as *valid*. See the comments in the code. > > - We now log all errors and display a summary on error in addition to > the simple list of the URLs we believe to be invalid. This should make > future debugging simpler. `[*]` - https://www.rfc-editor.org/rfc/rfc9110#section-10.1.5 Fixes: #5800 Signed-off-by: James O. D. Hunt <[email protected]> Signed-off-by: Chelsea Mafrica <[email protected]>
- Loading branch information