From 247f12ad3e20ac0b49ef517ec7ad6e9d1f5fc59a Mon Sep 17 00:00:00 2001 From: Diogo Castro Date: Mon, 7 Nov 2022 10:26:39 +0000 Subject: [PATCH 1/2] [Chore] Fix dockerhub tags Problem: We have a pipeline step to tag docker images on dockerhub whenever a new version is released: https://github.com/serokell/xrefcheck/blob/7dd5c4c3c954a531b5cad89857f31b27245f0ef9/.buildkite/pipeline.yml#L51-L56 However, this doesn't seem to be working, dockerhub only contains the `latest` tag: https://hub.docker.com/r/serokell/xrefcheck/tags The problem *seems* to be that the CI step is only triggered when it builds a branch with a name matching the regex `/^v[0-9]+.*/`. But we never use that format for branch names, so it's never triggered. Solution: 1. Change the CI step to trigger when it detects a tag with a version number 2. Enable the "Build tags" option in buildkite: https://buildkite.com/serokell/xrefcheck/settings/repository --- .buildkite/pipeline.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.buildkite/pipeline.yml b/.buildkite/pipeline.yml index 6b291364..e5c2a25c 100644 --- a/.buildkite/pipeline.yml +++ b/.buildkite/pipeline.yml @@ -56,4 +56,4 @@ steps: - nix run -f ci.nix pkgs.skopeo -c ./scripts/upload-docker-image.sh "docker-archive:$(readlink result)" "docker://docker.io/serokell/xrefcheck:${BUILDKITE_BRANCH}" label: Push release to dockerhub if: | - build.branch =~ /^v[0-9]+.*/ + build.tag =~ /^v[0-9]+.*/ From 2fd11bfeb19ecd41f9408f8c9e9d478dd3675816 Mon Sep 17 00:00:00 2001 From: Diogo Castro Date: Mon, 7 Nov 2022 13:42:30 +0000 Subject: [PATCH 2/2] [#147] Improve readme MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Improved the readme and fixed several problems: * Mention support for GitLab - this is important and wasn't mentioned anywhere. * Add a FAQ clarifying how xrefcheck behaves in some important situations. * We don't need to get into a lot of detail about the syntax of the `xrefcheck: ignore` annotations, where they're allowed and where they're not. A general idea and a couple of examples are more than enough. * Added the backlink `[↑](#xrefcheck)` where it was missing. * Fixed inconsistent level headers: we we're using `###` where we should be using `##` * `nix run` should now be `nix shell` * Add a link to `tests/configs/github-config.yaml` which contains a list of all supported config options. * Instead of mentioning GitHub Actions in the "usage" section and nix in a separate section, mention everything in the "usage" section. * Fixed link to `stack2cabal` * Fixed typos and rephrased some bits. --- README.md | 213 ++++++++++++++++++++++++------------------------------ 1 file changed, 93 insertions(+), 120 deletions(-) diff --git a/README.md b/README.md index 6f878c7e..8422dc7b 100644 --- a/README.md +++ b/README.md @@ -8,44 +8,43 @@ [![Build status](https://badge.buildkite.com/75461331a6058b334383cdfca1071dc1f908b70cf069d857b7.svg?branch=master)](https://buildkite.com/serokell/xrefcheck) -Xrefcheck is a tool for verifying local and external references in repository documentation that is quick, easy to setup, and suitable to be added to CI. +Xrefcheck is a tool for verifying local and external references in a repository's documentation that is quick, easy to setup, and suitable to be run on a CI pipeline. Output sample -### Motivation +## Motivation [↑](#xrefcheck) -As the project evolves, links in documentation have a tendency to get broken. This is usually because of: -1. File movements; -2. Markdown header renames; -3. Outer sites ceasing their existence. +As a project evolves, links in markdown documentation have a tendency to become broken. This is usually because: +1. A file has been moved; +2. A markdown header has been renamed; +3. An external site has ceased to exist. This tool will help you to keep references in order. +You can run `xrefcheck` continuously in your CI pipeline, +and it will let you know when it finds a broken link. -### Aims +## Aims [↑](#xrefcheck) Comparing to alternative solutions, this tool tries to achieve the following points: -* Quickness - local references are verified instantly even for moderately-sized repositories. -* Easy setup - no extra actions required, just run the tool in the repository root. -Both relative and absolute local links are supported out of the box. +* Quickness + * References are verified in parallel. + * References with the same target URI are only verified once. + * It first attempts to verify external links with a `HEAD` request; only when that fails does it try a `GET` request. +* Resilience + * When you have many links to the same domain, the service is likely to start replying with "429 Too Many Requests". + When this happens, `xrefcheck` will wait the requested amount of seconds before retrying. +* Easy setup - no extra actions required, just run `xrefcheck` in the repository root. * Conservative verifier allows using this tool in CI, no false positives (e.g. on sites which require authentication) should be reported. -### A comparison with other solutions +## Features [↑](#xrefcheck) -* [linky](https://github.com/mattias-p/linky) - a well-configurable verifier written in Rust, scans one specified file at a time and works good in pair with system utilities like `find`. - This tool requires some configuring before it can be applied to a repository or added to CI. -* [awesome_bot](https://github.com/dkhamsing/awesome_bot) - a solution written in Ruby that can be easily included in CI or integrated into GitHub. - Its features include duplicated URLs detection, specifying allowed HTTP error codes and reporting generation. - At the moment of writing, it scans only external references and checking anchors is not possible. -* [remark-validate-links](https://github.com/remarkjs/remark-validate-links) and [remark-lint-no-dead-urls](https://github.com/davidtheclark/remark-lint-no-dead-urls) - highly configurable Javascript solution for checking local and remote links resp. - It is able to check multiple repositores at once if they are gathered in one folder. - Being written on JavaScript, it is fairly slow on large repositories. -* [markdown-link-check](https://github.com/tcort/markdown-link-check) - another checker written in JavaScript, scans one specific file at a time. - Supports `mailto:` link resolution. -* [url-checker](https://github.com/paramt/url-checker) - GitHub action which checks links in specified files. -* [linkcheck](https://github.com/filiph/linkcheck) - advanced site crawler, checks for `HTML` files. There are other solutions for this particular task which we don't mention here. - -At the moment of writing, the listed solutions don't support ftp/ftps links. +* Supports both GitHub and GitLab flavored markdown. +* Supports Windows and Unix systems. +* Supports relative and absolute local links. +* Supports external links (`http`, `https`, `ftp` and `ftps`). +* Detects broken and ambiguous anchors in local links. +* Integration with GitHub Actions. ## Dependencies [↑](#xrefcheck) @@ -55,23 +54,27 @@ Xrefcheck requires you to have `git` version 2.18.0 or later in your PATH. We provide the following ways for you to use xrefcheck: -- [GitHub action](https://github.com/marketplace/actions/xrefcheck) -- [statically linked binaries](https://github.com/serokell/xrefcheck/releases) +- [GitHub Actions](https://github.com/marketplace/actions/xrefcheck) +- [Statically linked binaries](https://github.com/serokell/xrefcheck/releases) - [Docker image](https://hub.docker.com/r/serokell/xrefcheck) -- [building from source](#build-instructions-) +- [Building from source](#build-instructions-) +- Nix + ``` + nix shell -f https://github.com/serokell/xrefcheck/archive/master.tar.gz -c xrefcheck + ``` If none of those are suitable for you, please open an issue! -To find all broken links in a repository, run from within its folder: +To find all broken links in a repository, simply run `xrefcheck` from its root folder: ```sh xrefcheck ``` -To also display all found links and anchors: +To also display a list of all links and anchors: ```sh -xrefcheck -v +xrefcheck --verbose ``` For description of other options: @@ -80,113 +83,83 @@ For description of other options: xrefcheck --help ``` +To configure `xrefcheck`, run: -### Special functionality - -
- Ignoring external links - - If you want some external links to not be verified, you can use one of the following ways to ignore those links: - -1. Add the regular expression that matches the ignoring link to the `ignoreRefs` parameter of your config file. - - For example: - ```yaml - ignoreRefs: - - https://bad.reference.(org|com)(/?) - ``` - allows to ignore both `https://bad.reference.org` and `https://bad.reference.com` with or without last "/". - -2. Add right in-place annotation using one of the following ignoring modes (each mode is just a comment with a certain syntax). - - * Ignore the link: - - There are several ways to add this annotation: - - * Just add it like a regular text before the ignoring link. - - ```markdown - Bad ['com' reference](https://bad.reference.com) and bad ['org' reference](https://bad.reference.org) - ``` - - * Separate the ignoring link from the annotation and the following text with single new lines. - - ```markdown - Bad ['com' reference](https://bad.reference.com) and bad - ['org'](https://bad.reference.org) - reference - ``` - - Therefore only `https://bad.reference.org` will be ignored. - - * If the ignoring link is the first in a paragraph, then the annotation can also be added before a paragraph. - - ```markdown - - [Bad 'org' reference](https://bad.reference.org) - [Bad 'com' reference](https://bad.reference.com) - ``` - - It is still the same `https://bad.reference.org` will be ignored in this case. - - * Ignore the paragraph: +```sh +xrefcheck dump-config --type GitHub +``` - ```markdown - - Bad ['org' reference](https://bad.reference.org) - Bad ['com' reference](https://bad.reference.com) +This will create a `.xrefcheck.yaml` file with all the configuration +options, [here's an example](tests/configs/github-config.yaml). +This file should be committed to your repository. - Bad ['io' reference](https://bad.reference.io) - ``` +## Build instructions [↑](#xrefcheck) - In this way, `https://bad.reference.org` and `https://bad.reference.com` will be ignored and `https://bad.reference.io` will still be verified. +Run `stack install` to build everything and install the executable. +If you wish to use `cabal`, you need to run [`stack2cabal`](https://hackage.haskell.org/package/stack2cabal) first! - * Ignore the whole file: - ```markdown - - +## FAQ [↑](#xrefcheck) - - ...the rest of the file... - ``` +1. How do I ignore specific files? + * To ignore a specific file, you can either use the `--ignore ` command-line option, + or the `ignore` list in the config file. Links _to_ those files will be reported as errors, links _from_ those files will not be verified. - Using this you can ignore the whole file. -
+1. How do I ignore specific links? + * Add an entry to the `ignoreLocalRefsTo` or `ignoreExternalRefsTo` lists in the config file. + * Alternatively, add a `` annotation before the link: + ```md + + Link to some [invalid resource](https://fictitious.uri/). + ``` + ```md + A [valid link](https://www.google.com) + followed by an [invalid link](https://fictitious.uri/). + ``` + * You can also use a `` annotation to ignore all links in a paragraph. -## Configuring +1. How do I ignore all links from a specific markdown file? + * Add a glob pattern to the `ignoreRefsFrom` list in the config file. + * Or add a `` at the top of the file. -Configuration template (with all options explained) can be dumped with: +1. How do I ignore all external links? + * If you wish to ignore all http/ftp links, you can use `--mode local-only`. -```sh -xrefcheck dump-config -t GitHub -``` +1. How does `xrefcheck` handle links that require authentication? + * It's common for projects to contains links to protected resources. + By default, when `xrefcheck` attempts to verify a link and is faced with a `403 Forbidden` or a `401 Unauthorized`, it assumes the link is valid. + * This behavior can be disabled by setting `ignoreAuthFailures: false` in the config file. -Currently supported options include: -* Timeout for checking external references; -* List of ignored files. +1. How does `xrefcheck` handle redirects? + * `xrefcheck` follows up to 10 HTTP redirects. -## Build instructions [↑](#xrefcheck) - -Run `stack install` to build everything and install the executable. -If you want to use cabal, you need to run (`stack2cabal`)[https://hackage.haskell.org/package/stack2cabal] first! - -### CI and nix [↑](#xrefcheck) - -To build only the executables, run `nix-build`. You can use this line on your CI to use xrefcheck: -``` -nix run -f https://github.com/serokell/xrefcheck/archive/master.tar.gz -c xrefcheck -``` +1. How does `xrefcheck` handle localhost links? + * By default, `xrefcheck` will ignore links to localhost. + * This behavior can be disabled by removing the corresponding entry from the `ignoreExternalRefsTo` list in the config file. -Our CI uses `nix-build xrefcheck.nix` to build the whole project, including tests and Haddock. -It is based on the [`haskell.nix`](https://input-output-hk.github.io/haskell.nix/) project. -You can do that too if you wish. - -## For further work [↑](#xrefcheck) +## Further work [↑](#xrefcheck) - [ ] Support for non-Unix systems. - [ ] Support link detection in different languages, not only Markdown. - [ ] Haskell Haddock is first in turn. +## A comparison with other solutions [↑](#xrefcheck) + +* [linky](https://github.com/mattias-p/linky) - a well-configurable verifier written in Rust, scans one specified file at a time and works well with system utilities like `find`. + This tool requires some configuring before it can be applied to a repository or added to CI. +* [awesome_bot](https://github.com/dkhamsing/awesome_bot) - a solution written in Ruby that can be easily included in CI or integrated into GitHub. + Its features include duplicated URLs detection, specifying allowed HTTP error codes and reporting generation. + At the moment of writing, it scans only external references and checking anchors is not possible. +* [remark-validate-links](https://github.com/remarkjs/remark-validate-links) and [remark-lint-no-dead-urls](https://github.com/davidtheclark/remark-lint-no-dead-urls) - highly configurable JavaScript solution for checking local and external links respectively. + It is able to check multiple repositores at once if they are gathered in one folder. + Doesn't handle "429 Too Many Requests", so false positives are likely when you have many links to the same domain. +* [markdown-link-check](https://github.com/tcort/markdown-link-check) - another checker written in JavaScript, scans one specific file at a time. + Supports `mailto:` link resolution. +* [url-checker](https://github.com/paramt/url-checker) - GitHub Action which checks external links in specified files. + Does not check local links. +* [linkcheck](https://github.com/filiph/linkcheck) - advanced site crawler, verifies links in `HTML` files. There are other solutions for this particular task which we don't mention here. + +At the moment of writing, the listed solutions don't support ftp/ftps links. + ## Issue tracker [↑](#xrefcheck) We use GitHub issues as our issue tracker.