This is a spellchecker that recursively fetches HTML pages, converts them to plain text (using pandoc), and spellchecks them with hunspell. Unknown words will be printed to stdout
, which makes the tool a good candidate for CI pipelines where you might want to take action when a spelling error is found on a web page.
Words that are not in the dictionary for the given language (inferred from the lang
attribute of the HTML document's root element) can be added to a personal dictionary, which will mark the word as correctly spelled.
-
The following command will retrieve the HTML document at https://example.com, spellcheck it, and not print anything because there are no errors:
$ httpspell https://example.com
The exit code is
0
. -
The following command will spellcheck the README of this project as rendered by GitHub, and print a list of unknown words. Note that we set the language to
en_US
because GitHub declares 'en' as document language, but the installed dictionaries usually refer the a specific language variant likeen_US
:$ httpspell https://github.com/suhlig/httpspell/blob/master/README.markdown --language en_US suhlig Permalink httpspell sloc pandoc hunspell ...
The exit code is
1
.
- When spidering a site,
httpspell
will skip all responses with acontent-type
header other thantext/html
(unless pointing it to file, in which case it accepts anything). - Before converting,
httpspell
removes the following nodes from the HTML DOM as they are not a good target for spellchecking:code
pre
- Elements with
spellcheck='false'
(this is how HTML5 allows tagging elements as a being target for spellchecking or not)
If you produce content with kramdown (e.g. using Jekyll), an Inline Attribute List can be used to set spellcheck='false'
for an element by adding this line after the element (e.g. heading):
{: spellcheck="false"}
Hunspell uses the system dictionary paths; on the Mac this is ~/Library/Spelling/
. Get some dictionaries as explained in the hunspell project:
$ wget -O ~/Library/Spelling/en_US.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff
$ wget -O ~/Library/Spelling/en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic
German:
$ wget -O ~/Library/Spelling/de_DE.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.dic
$ wget -O ~/Library/Spelling/de_DE.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.aff
Italian (for integration tests):
$ wget -O ~/Library/Spelling/it_IT.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.dic
$ wget -O ~/Library/Spelling/it_IT.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.aff