Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

static-checks: Try multiple user agents #5801

Merged
merged 2 commits into from
Dec 7, 2023

Commits on Dec 6, 2023

  1. static-checks: Move curl to a separate function

    Split the call to `curl` in the URL checker out into a new
    `run_url_check_cmd()` function to make `check_url()` slightly clearer.
    
    Fixes kata-containers#5800
    
    Signed-off-by: James O. D. Hunt <[email protected]>
    Signed-off-by: Chelsea Mafrica <[email protected]>
    jodh-intel authored and Chelsea Mafrica committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    3a6e5b6 View commit details
    Browse the repository at this point in the history
  2. static-checks: Try multiple user agents

    Make the URL checker cycle through a list of user agent values until we
    hit one the remote server is happy with.
    
    This is required since, unfortunately, we really, really want to check
    these URLs, but some sites block clients based on their `User-Agent`
    (UA) request header value. And of course, each site is different and can
    change its behaviour at any time.
    
    Our strategy therefore is to try various UA's until we find one the
    server accepts:
    
    - No explicit UA (use `curl`'s default)
    - Explicitly no UA.
    - A blank UA.
    - Partial UA values for various CLI tools.
    - Partial UA values for various console web browsers.
    - Partial UA for Emacs's built-in browser.
    - The existing UA which is used as a "last ditch" attempt where the UA implies multiple platforms and browser.
    
    > **Notes:**
    >
    > - The "partial UA" values specify specify the UA "product" but not the
    >   UA "product version": we specify `foo` and not `foo/1.2.3`). We do
    >   this since most sites tested appear to not care about the version.
    >   This is as expected given that the version is strictly optional (see `[*]`).
    >
    > - We now treat URLs that the server reports as HTTP 401, HTTP 402 or
    >   HTTP 403 as *valid*. See the comments in the code.
    >
    > - We now log all errors and display a summary on error in addition to
    >   the simple list of the URLs we believe to be invalid. This should make
    >   future debugging simpler.
    
    `[*]` - https://www.rfc-editor.org/rfc/rfc9110#section-10.1.5
    
    Fixes: kata-containers#5800
    
    Signed-off-by: James O. D. Hunt <[email protected]>
    Signed-off-by: Chelsea Mafrica <[email protected]>
    jodh-intel authored and Chelsea Mafrica committed Dec 6, 2023
    Configuration menu
    Copy the full SHA
    da945ba View commit details
    Browse the repository at this point in the history