index.bs

<pre class="metadata">
Title: HTML Sanitizer API
Status: CG-DRAFT
Group: WICG
URL: https://wicg.github.io/sanitizer-api/
Repository: WICG/sanitizer-api
Shortname: sanitizer-api
Level: 1
Editor: Frederik Braun 68466, Mozilla, fbraun@mozilla.com, https://frederik-braun.com
Editor: Mario Heiderich, Cure53, mario@cure53.de, https://cure53.de
Editor: Daniel Vogelheim, Google LLC, vogelheim@google.com, https://www.google.com
Abstract:
  This document specifies a set of APIs which allow developers to take
  untrusted HTML input and sanitize it for safe insertion into a document's
  DOM.
Indent: 2
Work Status: exploring
Boilerplate: omit conformance
Markup Shorthands: css off, markdown on
</pre>
<pre class="link-defaults">
spec:html; type:attribute; text: innerHTML
spec:dom; type:method; text: createDocumentFragment
spec:html; type:dfn; text: template contents
</pre>
<pre class="anchors">
text: window.toStaticHTML(); type: method; url: https://msdn.microsoft.com/en-us/library/cc848922(v=vs.85).aspx
text: internal slot; type:dfn; url: https://tc39.es/ecma262/#sec-ordinary-object-internal-methods-and-internal-slots
text: parse HTML from a string; type: dfn; url: https://html.spec.whatwg.org/#parse-html-from-a-string
</pre>
<pre class="biblio">
{
  "DOMPURIFY": {
    "href": "https://github.com/cure53/DOMPurify",
    "title": "DOMPurify",
    "publisher": "Cure53"
  },
  "MXSS": {
    "href": "https://cure53.de/fp170.pdf",
    "title": "mXSS Attacks: Attacking well-secured Web-Applications by using innerHTML Mutations",
    "publisher": "Ruhr-Universität Bochum"
  }
}
</pre>

# Introduction # {#intro}

<em>This section is not normative.</em>

Web applications often need to work with strings of HTML on the client side,
perhaps as part of a client-side templating solution, perhaps as part of
rendering user generated content, etc. It is difficult to do so in a safe way.
The naive approach of joining strings together and stuffing them into
an {{Element}}'s {{Element/innerHTML}} is fraught with risk, as it can cause
JavaScript execution in a number of unexpected ways.

Libraries like [[DOMPURIFY]] attempt to manage this problem by carefully
parsing and sanitizing strings before insertion, by constructing a DOM and
filtering its members through an allow-list. This has proven to be a fragile
approach, as the parsing APIs exposed to the web don't always map in
reasonable ways to the browser's behavior when actually rendering a string as
HTML in the "real" DOM. Moreover, the libraries need to keep on top of
browsers' changing behavior over time; things that once were safe may turn
into time-bombs based on new platform-level features.

The browser has a fairly good idea of when it is going to
execute code. We can improve upon the user-space libraries by teaching the
browser how to render HTML from an arbitrary string in a safe manner, and do
so in a way that is much more likely to be maintained and updated along with
the browser's own changing parser implementation. This document outlines an
API which aims to do just that.

## Goals ## {#goals}

*   Mitigate the risk of DOM-based cross-site scripting attacks by providing
    developers with mechanisms for handling user-controlled HTML which prevent
    direct script execution upon injection.

*   Make HTML output safe for use within the current user agent, taking into
    account its current understanding of HTML.

*   Allow developers to override the default set of elements and attributes.
    Adding certain elements and attributes can prevent
    <a href="https://github.com/google/security-research-pocs/tree/master/script-gadgets">script gadget</a>
    attacks.

## API Summary ## {#api-summary}

The Sanitizer API offers functionality to parse a string containing HTML into
a DOM tree, and to filter the resulting tree according to a user-supplied
configuration. The methods come in two by two flavours:

* Safe and unsafe: The "safe" methods will not generate any markup that executes
  script. That is, they should be safe from XSS. The "unsafe" methods will parse
  and filter whatever they're supposed to.
* Context: Methods are defined on {{Element}} and {{ShadowRoot}} and will
  replace these {{Node}}'s children, and are largely analogous to {{Element/innerHTML}}.
  There are also static methods on the {{Document}}, which parse an entire
  document are largely analogous to {{DOMParser}}.{{parseFromString()}}.


# Framework # {#framework}

## Sanitizer API ## {#sanitizer-api}

The {{Element}} interface defines two methods, {{Element/setHTML()}} and
{{Element/setHTMLUnsafe()}}. Both of these take a {{DOMString}} with HTML
markup, and an optional configuration.

<pre class="idl extract">
partial interface Element {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>

<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:

1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
   {{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Element setHTMLUnsafe", and "script".
1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
   {{HTMLTemplateElement|template}} element; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |compliantHTML|, |options|, and false.

</div>

<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTML</dfn>(|html|, |options|) method steps are:

1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
   {{HTMLTemplateElement|template}}; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |html|, |options|, and true.

</div>

<pre class="idl extract">
partial interface ShadowRoot {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>

These methods are mirrored on the {{ShadowRoot}}:

<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:

1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
   {{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "ShadowRoot setHTMLUnsafe", and "script".
1. [=Set and filter HTML=] using [=this=],
   [=this=]'s [=shadow host=] (as context element),
   |compliantHTML|, |options|, and false.

</div>

<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTML</dfn>(|html|, |options|)</dfn> method steps are:

1. [=Set and filter HTML=] using [=this=] (as target), [=this=] (as context element),
   |html|, |options|, and true.

</div>

The {{Document}} interface gains two new methods which parse an entire {{Document}}:

<pre class="idl extract">
partial interface Document {
  static Document parseHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLOptions options = {});
  static Document parseHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>

<div algorithm>
The <dfn for="Document" export>parseHTMLUnsafe</dfn>(|html|, |options|) method steps are:

1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
   {{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Document parseHTMLUnsafe", and "script".
1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".

   Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |compliantHTML|.
1. Let |config| be the result of calling [=get a sanitizer config from options=]
   with |options| and false.
1. If |config| is not [=list/empty=],
   then call [=sanitize=] on |document|'s [=tree/root|root node=] with |config|.
1. Return |document|.

</div>


<div algorithm>
The <dfn for="Document" export>parseHTML</dfn>(|html|, |options|) method steps are:

1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".

   Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |html|.
1. Let |config| be the result of calling [=get a sanitizer config from options=]
   with |options| and true.
1. Call [=sanitize=] on |document|'s [=tree/root|root node=] with |config|.
1. Return |document|.

</div>

## SetHTML options and the configuration object. ## {#configobject}

The family of {{Element/setHTML()}}-like methods all accept an options
dictionary. Right now, only one member of this dictionary is defined:

<pre class=idl>
dictionary SetHTMLOptions {
  (Sanitizer or SanitizerConfig) sanitizer = {};
};
</pre>

The {{Sanitizer}} configuration object encapsulates a filter configuration.
The same config can be used with both safe or unsafe methods. The intent is
that one (or a few) configurations will be built-up early on in a page's
lifetime, and can then be used whenever needed. This allows implementations
to pre-process configurations.

The configuration object is also query-able and can return
[=SanitizerConfig/canonical=] configuration dictionaries,
in both safe and unsafe variants. This allows a
page to query and predict what effect a given configuration will have, or
to build a new configuration based on an existing one.

<pre class=idl>
[Exposed=(Window,Worker)]
interface Sanitizer {
  constructor(optional SanitizerConfig config = {});
  SanitizerConfig get();
  SanitizerConfig getUnsafe();
};
</pre>

<div algorithm>
The <dfn for="Sanitizer" export>constructor</dfn>(|config|)
method steps are:

1. Store |config| in [=this=]'s [=internal slot=].

</div>

<div algorithm>
The <dfn for="Sanitizer" export>get</dfn>() method steps are:

1. Return the result of [=canonicalize a configuration=] with the value of
   [=this=]'s [=internal slot=] and true.

</div>

<div algorithm>
The <dfn for="Sanitizer" export>getUnsafe</dfn>() method steps are:

1. Return the result of [=canonicalize a configuration=] with the value of
   [=this=]'s [=internal slot=] and false.

</div>

## The Configuration Dictionary ## {#config}

<pre class=idl>
dictionary SanitizerElementNamespace {
  required DOMString name;
  DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};

// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
  sequence&lt;SanitizerAttribute> attributes;
  sequence&lt;SanitizerAttribute> removeAttributes;
};

typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;

dictionary SanitizerAttributeNamespace {
  required DOMString name;
  DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;

dictionary SanitizerConfig {
  sequence&lt;SanitizerElementWithAttributes> elements;
  sequence&lt;SanitizerElement> removeElements;
  sequence&lt;SanitizerElement> replaceWithChildrenElements;

  sequence&lt;SanitizerAttribute> attributes;
  sequence&lt;SanitizerAttribute> removeAttributes;

  boolean comments;
  boolean dataAttributes;
};
</pre>


# Algorithms # {#algorithms}

<div algorithm>
To <dfn>set and filter HTML</dfn>, given an {{Element}} or {{DocumentFragment}}
|target|, an {{Element}} |contextElement|, a [=string=] |html|, and a
[=dictionary=] |options|, and a [=boolean=] |safe|:

1. If |safe| and |contextElement|'s [=Element/local name=] is "`script`" and
   |contextElement|'s [=Element/namespace=] is the [=HTML namespace=] or the
   [=SVG namespace=], then return.
1. Let |config| be the result of calling [=get a sanitizer config from options=]
   with |options| and |safe|.
1. Let |newChildren| be the result of the HTML [=fragment parsing algorithm steps=]
   given |contextElement|, |html|, and true.
1. Let |fragment| be a new {{DocumentFragment}} whose [=node document=] is |contextElement|'s [=node document=].
1. [=list/iterate|For each=] |node| in |newChildren|, [=list/append=] |node| to |fragment|.
1. If |config| is not [=list/empty=], then run [=sanitize=] on |fragment| using |config|.
1. [=Replace all=] with |fragment| within |target|.

</div>

<div algorithm>
To <dfn for="SanitizerConfig">get a sanitizer config from options</dfn> for
an options dictionary |options| and a boolean |safe|, do:

1. Assert: |options| is a [=dictionary=].
1. If |options|["`sanitizer`"] doesn't [=map/exist=], then return undefined.
1. Assert: |options|["`sanitizer`"] is either a {{Sanitizer}} instance
   or a [=dictionary=].
1. If |options|["`sanitizer`"] is a {{Sanitizer}} instance:
   1. Then let |config| be the value of |options|["`sanitizer`"]'s [=internal slot=].
   1. Otherwise let |config| be the value of |options|["`sanitizer`"].
1. Return the result of calling [=canonicalize a configuration=] on
   |config| and |safe|.

</div>

## Sanitization Algorithms ## {#sanitization}

<div algorithm="sanitize">
For the main <dfn>sanitize</dfn> operation, using a {{ParentNode}} |node|, a
[=SanitizerConfig/canonical=] {{SanitizerConfig}} |config|, run these steps:

1. [=Assert=]: |config| is [=SanitizerConfig/canonical=].
1. Let |current| be |node|.
1. [=list/iterate|For each=] |child| in |current|'s [=tree/children=]:
  1. [=Assert=]: |child| [=implements=] {{Text}}, {{Comment}}, or {{Element}}.

     Note: Currently, this algorithm is only called on output of the HTML
           parser for which this assertion should hold. If in the future
           this algorithm will be used in different contexts, this assumption
           needs to be re-examined.
  1. If |child| [=implements=] {{Text}}:
    1. [=continue=].
  1. else if |child| [=implements=] {{Comment}}:
    1. If |config|'s {{SanitizerConfig/comments}} is not true:
      1. [=/remove=] |child|.
  1. else:
    1. Let |elementName| be a {{SanitizerElementNamespace}} with |child|'s
       [=Element/local name=] and [=Element/namespace=].
    1. If |config|["{{SanitizerConfig/elements}}"] exists and
       |config|["{{SanitizerConfig/elements}}"] does not [=SanitizerConfig/contain=]
       [|elementName|]:
       1. [=/remove=] |child|.
    1. else if |config|["{{SanitizerConfig/removeElements}}"] exists and
       |config|["{{SanitizerConfig/removeElements}}"] [=SanitizerConfig/contains=]
       [|elementName|]:
       1. [=/remove=] |child|.
    1. If |config|["{{SanitizerConfig/replaceWithChildrenElements}}"] exists and |config|["{{SanitizerConfig/replaceWithChildrenElements}}"] [=SanitizerConfig/contains=] |elementName|:
      1. Call [=sanitize=] on |child| with |config|.
      1. Call [=replace all=] with |child|'s [=tree/children=] within |child|.
    1. If |elementName| [=equals=] &laquo;[ "`name`" &rightarrow; "`template`",
       "`namespace`" &rightarrow; [=HTML namespace=] ]&raquo;
      1. Then call [=sanitize=] on |child|'s [=template contents=] with |config|.
    1. If |child| is a [=shadow host=]:
      1. Then call [=sanitize=] on |child|'s [=Element/shadow root=] with |config|.
    1. [=list/iterate|For each=] |attr| in |current|'s [=Element/attribute list=]:
      1. Let |attrName| be a {{SanitizerAttributeNamespace}} with |attr|'s
         [=Attr/local name=] and [=Attr/namespace=].
      1. If |config|["{{SanitizerConfig/attributes}}"] exists and
         |config|["{{SanitizerConfig/attributes}}"] does not [=SanitizerConfig/contain=]
         |attrName|:
         1. If "data-" is a [=code unit prefix=] of [=Attr/local name=] and
            if [=Attr/namespace=] is `null` and
            if |config|["{{SanitizerConfig/dataAttributes}}"] exists and is false:
            1. Remove |attr| from |child|.
      1. else if |config|["{{SanitizerConfig/removeAttributes}}"] exists and
         |config|["{{SanitizerConfig/removeAttributes}}"] [=SanitizerConfig/contains=]
         |attrName|:
         1. Remove |attr| from |child|.
      1. If |config|["{{SanitizerConfig/elements}}"][|elementName|] exists,
         and if
         |config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
         exists, and if
         |config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
         does not [=SanitizerConfig/contain=] |attrName|:
         1. Remove |attr| from |child|.
      1. If |config|["{{SanitizerConfig/elements}}"][|elementName|] exists,
         and if
         |config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
         exists, and if
         |config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
         [=SanitizerConfig/contains=] |attrName|:
         1. Remove |attr| from |child|.
      1. If &laquo;[|elementName|, |attrName|]&raquo; matches an entry in the
         [=navigating URL attributes list=], and if |attr|'s [=protocol=] is
         "`javascript:`":
         1. Then remove |attr| from |child|.
       1. Call [=sanitize=] on |child|'s [=Element/shadow root=] with |config|.
    1. else:
      1. [=/remove=] |child|.

</div>

## Configuration Processing ## {#configuration-processing}

<div algorithm>
A |config| is <dfn for="SanitizerConfig">valid</dfn> if all these conditions are met:

1. |config| is a [=dictionary=]
1. |config|'s [=map/keys|key set=] does not [=list/contain=] both
   "{{SanitizerConfig/elements}}" and "{{SanitizerConfig/removeElements}}"
1. |config|'s [=map/keys|key set=] does not [=list/contain=] both
   "{{SanitizerConfig/removeAttributes}}" and "{{SanitizerConfig/attributes}}".
1. [=list/iterate|For any=] |key| of &laquo;[
   "{{SanitizerConfig/elements}}",
   "{{SanitizerConfig/removeElements}}",
   "{{SanitizerConfig/replaceWithChildrenElements}}",
   "{{SanitizerConfig/attributes}}",
   "{{SanitizerConfig/removeAttributes}}"
    ]&raquo; where |config|[|key|] [=map/exists=]:
   1. |config|[|key|] is [=SanitizerNameList/valid=].
1. If |config|["{{SanitizerConfig/elements}}"] exists, then
   [=list/iterate|for any=] |element| in |config|[|key|] that is a [=dictionary=]:
   1. |element| does not [=list/contain=] both
      "{{SanitizerElementNamespaceWithAttributes/attributes}}" and
      "{{SanitizerElementNamespaceWithAttributes/removeAttributes}}".
   1. If either |element|["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
      or |element|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
      [=map/exists=], then it is [=SanitizerNameList/valid=].
   1. Let |tmp| be a [=dictionary=], and for any |key| &laquo;[
      "{{SanitizerConfig/elements}}",
      "{{SanitizerConfig/removeElements}}",
      "{{SanitizerConfig/replaceWithChildrenElements}}",
      "{{SanitizerConfig/attributes}}",
      "{{SanitizerConfig/removeAttributes}}"
      ]&raquo; |tmp|[|key|] is set to the result of [=canonicalize a sanitizer
      element list=] called on |config|[|key|], and [=HTML namespace=] as default
      namespace for the element lists, and `null` as default namespace for the
      attributes lists.

      Note: The intent here is to assert about list elements, but without regard
            to whether the string shortcut syntax or the explicit dictionary
            syntax is used. For example, having "img" in `elements` and
            `{ name: "img" }` in `removeElements`. An implementation might well
            do this without explicitly canonicalizing the lists at this point.

      1. Given theses canonicalized name lists, all of the following conditions hold:

        1. The [=set/intersection=] between
           |tmp|["{{SanitizerConfig/elements}}"] and
           |tmp|["{{SanitizerConfig/removeElements}}"]
           is [=set/empty=].
        1. The [=set/intersection=] between
           |tmp|["{{SanitizerConfig/removeElements}}"]
           |tmp|["{{SanitizerConfig/replaceWithChildrenElements}}"]
           is [=set/empty=].
        1. The [=set/intersection=] between
           |tmp|["{{SanitizerConfig/replaceWithChildrenElements}}"] and
           |tmp|["{{SanitizerConfig/elements}}"]
           is [=set/empty=].
        1. The [=set/intersection=] between
           |tmp|["{{SanitizerConfig/attributes}}"] and
           |tmp|["{{SanitizerConfig/removeAttributes}}"]
           is [=set/empty=].

    1. Let |tmpattrs| be |tmp|["{{SanitizerConfig/attributes}}"] if it exists,
       and otherwise [=built-in default config=]["{{SanitizerConfig/attributes}}"].
    1. [=list/iterate|For any=] |item| in |tmp|["{{SanitizerConfig/elements}}"]:
       1. If either |item|["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
          or |item|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
          exists:
          1. Then the [=set/difference=] between it and |tmpattrs| is [=set/empty=].

</div>

<div algorithm>
A |list| of names is <dfn for="SanitizerNameList">valid</dfn> if all these
conditions are met:

1. |list| is a [=/list=].
1. [=list/iterate|For all=] of its members |name|:
   1. |name| is a {{string}} or a [=dictionary=].
   1. If |name| is a [=dictionary=]:
      1. |name|["{{SanitizerElementNamespace/name}}"] [=map/exists=] and is a {{string}}.

</div>

<div algorithm>
A |config| is <dfn for="SanitizerConfig">canonical</dfn> if all these conditions are met:

1. |config| is [=SanitizerConfig/valid=].
1. |config|'s [=map/keys|key set=] is a [=set/subset=] of
   &laquo;[
   "{{SanitizerConfig/elements}}",
   "{{SanitizerConfig/removeElements}}",
   "{{SanitizerConfig/replaceWithChildrenElements}}",
   "{{SanitizerConfig/attributes}}",
   "{{SanitizerConfig/removeAttributes}}",
   "{{SanitizerConfig/comments}}",
   "{{SanitizerConfig/dataAttributes}}"
   ]&raquo;
1. |config|'s [=map/keys|key set=] [=list/contains=] either:
   1. both "{{SanitizerConfig/elements}}" and "{{SanitizerConfig/attributes}}",
      but neither of
      "{{SanitizerConfig/removeElements}}" or "{{SanitizerConfig/removeAttributes}}".
   1. or both
      "{{SanitizerConfig/removeElements}}" and "{{SanitizerConfig/removeAttributes}}",
      but neither of
      "{{SanitizerConfig/elements}}" or "{{SanitizerConfig/attributes}}".
1. For any |key| of &laquo;[
      "{{SanitizerConfig/replaceWithChildrenElements}}",
      "{{SanitizerConfig/removeElements}}",
      "{{SanitizerConfig/attributes}}",
      "{{SanitizerConfig/removeAttributes}}"
      ]&raquo; where |config|[|key|] [=map/exists=]:
   1. |config|[|key|] is [=SanitizerNameList/canonical=].
1. If |config|["{{SanitizerConfig/elements}}"] [=map/exists=]:
   1. |config|["{{SanitizerConfig/elements}}"] is [=SanitizerNameWithAttributesList/canonical=].
1. For any |key| of &laquo;[
   "{{SanitizerConfig/comments}}",
   "{{SanitizerConfig/dataAttributes}}"
   ]&raquo;:
   1. if |config|[|key|] [=map/exists=], |config|[|key|] is a {{boolean}}.

</div>

<div algorithm>
A |list| of names is <dfn for="SanitizerNameList">canonical</dfn> if all these
conditions are met:

1. |list|[|key|] is a [=/list=].
1. [=list/iterate|For all=] of its |list|[|key|]'s members |name|:
   1. |name| is a [=dictionary=].
   1. |name|'s [=map/keys|key set=] [=set/equals=] &laquo;[
    "{{SanitizerElementNamespace/name}}", "{{SanitizerElementNamespace/namespace}}"
     ]&raquo;
  1. |name|'s [=map/values=] are [=string=]s.

</div>

<div algorithm>
A |list| of names is <dfn for="SanitizerNameWithAttributesList">canonical</dfn>
if all these conditions are met:

1. |list|[|key|] is a [=/list=].
1. [=list/iterate|For all=] of its |list|[|key|]'s members |name|:
   1. |name| is a [=dictionary=].
   1. |name|'s [=map/keys|key set=] [=set/equals=] one of:
      1. &laquo;[
         "{{SanitizerElementNamespace/name}}",
         "{{SanitizerElementNamespace/namespace}}"
         ]&raquo;
      1. &laquo;[
         "{{SanitizerElementNamespace/name}}",
         "{{SanitizerElementNamespace/namespace}}",
         "{{SanitizerElementNamespaceWithAttributes/attributes}}"
         ]&raquo;
      1. &laquo;[
         "{{SanitizerElementNamespace/name}}",
         "{{SanitizerElementNamespace/namespace}}",
         "{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"
         ]&raquo;
   1. |name|["{{SanitizerElementNamespace/name}}"] and
      |name|["{{SanitizerElementNamespace/namespace}}"] are [=string=]s.
   1. |name|["{{SanitizerElementNamespaceWithAttributes/attributes}}"] and
      |name|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
      are [=SanitizerNameList/canonical=] if they [=map/exist=].

</div>


<div algorithm>
To <dfn>canonicalize a configuration</dfn> |config| with a [=boolean=] |safe|:

Note: The initial set of [=assert=]s assert properties of the built-in
      constants, like the [=built-in default config|defaults=] and
      the lists of known [=known elements|elements=] and
      [=known attributes|attributes=].

1. [=Assert=]: [=built-in default config=] is [=SanitizerConfig/canonical=].
1. [=Assert=]: [=built-in default config=]["elements"] is a [=subset=] of [=known elements=].
1. [=Assert=]: [=built-in default config=]["attributes"] is a [=subset=] of [=known attributes=].
1. [=Assert=]: &laquo;[
   "elements" &rightarrow; [=known elements=],
   "attributes" &rightarrow; [=known attributes=],
   ]&raquo; is [=SanitizerConfig/canonical=].
1. If |config| is [=list/empty=] and not |safe|, then return &laquo;[]&raquo;
1. If |config| is not [=SanitizerConfig/valid=], then [=throw=] a {{TypeError}}.
1. Let |result| be a new [=dictionary=].
1. For each |key| of &laquo;[
   "{{SanitizerConfig/elements}}",
   "{{SanitizerConfig/removeElements}}",
   "{{SanitizerConfig/replaceWithChildrenElements}}" ]&raquo;:
  1. If |config|[|key|] exists, set |result|[|key|] to the result of running
     [=canonicalize a sanitizer element list=] on |config|[|key|] with
     [=HTML namespace=] as the default namespace.
1. For each |key| of &laquo;[
   "{{SanitizerConfig/attributes}}",
   "{{SanitizerConfig/removeAttributes}}" ]&raquo;:
  1. If |config|[|key|] exists, set |result|[|key|] to the result of running
     [=canonicalize a sanitizer element list=] on |config|[|key|] with `null` as
     the default namespace.
1. Set |result|["{{SanitizerConfig/comments}}"] to
   |config|["{{SanitizerConfig/comments}}"].
1. Let |default| be the result of [=canonicalizing a configuration=] for the
   [=built-in default config=].
1. If |safe|:
   1. If |config|["{{SanitizerConfig/elements}}"] [=map/exists=]:
      1. Let |elementBlockList| be the [=set/difference=] between
         [=known elements=] |default|["{{SanitizerConfig/elements}}"].

         Note: The "natural" way to enforce the default element list would be
               to intersect with it. But that would also eliminate any unknown
               (i.e., non-HTML supplied element, like &lt;foo&gt;). So we
               construct this helper to be able to use it to subtract any "unsafe"
               elements.
      1. Set |result|["{{SanitizerConfig/elements}}"] to the
         [=set/difference=] of |result|["{{SanitizerConfig/elements}}"] and
         |elementBlockList|.
   1. If |config|["{{SanitizerConfig/removeElements}}"] [=map/exists=]:
       1. Set |result|["{{SanitizerConfig/elements}}"] to the
          [=set/difference=] of |default|["{{SanitizerConfig/elements}}"]
          and |result|["{{SanitizerConfig/removeElements}}"].
       1. [=set/Remove=] "{{SanitizerConfig/removeElements}}" from |result|.
   1. If neither |config|["{{SanitizerConfig/elements}}"] nor
      |config|["{{SanitizerConfig/removeElements}}"] [=map/exist=]:
      1. Set |result|["{{SanitizerConfig/elements}}"] to
         |default|["{{SanitizerConfig/elements}}"].
   1. If |config|["{{SanitizerConfig/attributes}}"] [=map/exists=]:
      1. Let |attributeBlockList| be the [=set/difference=] between
         [=known attributes=] and |default|["{{SanitizerConfig/attributes}}"];
      1. Set |result|["{{SanitizerConfig/attributes}}"] to the
         [=set/difference=] of |result|["{{SanitizerConfig/attributes}}"] and
         |attributeBlockList|.
   1. If |config|["{{SanitizerConfig/removeAttributes}}"] [=map/exists=]:
       1. Set |result|["{{SanitizerConfig/attributes}}"] to the
          [=set/difference=] of |default|["{{SanitizerConfig/attributes}}"]
          and |result|["{{SanitizerConfig/removeAttributes}}"].
       1. [=set/Remove=] "{{SanitizerConfig/removeAttributes}}" from |result|.
   1. If neither |config|["{{SanitizerConfig/attributes}}"] nor
      |config|["{{SanitizerConfig/removeAttributes}}"] [=map/exist=]:
      1. Set |result|["{{SanitizerConfig/attributes}}"] to
         |default|["{{SanitizerConfig/attributes}}"].
1. Else (if not |safe|):
   1. If neither  |config|["{{SanitizerConfig/elements}}"] nor
      |config|["{{SanitizerConfig/removeElements}}"] [=map/exist=]:
      1. Set |result|["{{SanitizerConfig/elements}}"] to
         |default|["{{SanitizerConfig/elements}}"].
   1. If neither  |config|["{{SanitizerConfig/attributes}}"] nor
      |config|["{{SanitizerConfig/removeAttributes}}"] [=map/exist=]:
      1. Set |result|["{{SanitizerConfig/attributes}}"] to
         |default|["{{SanitizerConfig/attributes}}"].
1. [=Assert=]: |result| is [=SanitizerConfig/valid=].
1. [=Assert=]: |result| is [=SanitizerConfig/canonical=].
1. Return |result|.

</div>

<div algorithm>
In order to <dfn>canonicalize a sanitizer element list</dfn> |list|, with a
default namespace |defaultNamespace|, run the following steps:

1. Let |result| be a new [=ordered set=].
2. [=list/iterate|For each=] |name| in |list|, call
   [=canonicalize a sanitizer name=] on |name| with |defaultNamespace| and
   [=set/append=] to |result|.
3. Return |result|.

</div>

<div algorithm>
In order to <dfn>canonicalize a sanitizer name</dfn> |name|, with a default
namespace |defaultNamespace|, run the following steps:

1. [=Assert=]: |name| is either a {{DOMString}} or a [=dictionary=].
1. If |name| is a {{DOMString}}, then return &laquo;[ "`name`" &rightarrow; |name|, "`namespace`" &rightarrow; |defaultNamespace|]&raquo;.
1. [=Assert=]: |name| is a [=dictionary=] and |name|["name"] [=map/exists=].
1. Return &laquo;[ <br>
  "`name`" &rightarrow; |name|["name"], <br>
  "`namespace`" &rightarrow; |name|["namespace"] if it [=map/exists=], otherwise |defaultNamespace| <br>
  ]&raquo;.

</div>

## Supporting Algorithms ## {#alg-support}

<div algorithm>
For the [=canonicalize a sanitizer name|canonicalized=]
{{SanitizerElementNamespace|element}} and {{SanitizerAttributeNamespace|attribute name}} lists
used in this spec, list membership is based on matching both "`name`" and "`namespace`"
entries:
A Sanitizer name |list| <dfn for="SanitizerConfig">contains</dfn> an |item|
if there exists an |entry| of |list| that is an [=ordered map=], and where
|item|["name"] [=equals=] |entry|["name"] and
|item|["namespace"] [=equals=] |entry|["namespace"].

</div>

<div algorithm>
Set difference (or set subtraction) is a clone of a set A, but with all members
removed that occur in a set B:
To compute the <dfn for="set">difference</dfn> of two [=ordered sets=] |A| and |B|:

1. Let |set| be a new [=ordered set=].
1. [=list/iterate|For each=] |item| of |A|:
   1. If |B| does not [=set/contain=] |item|, then [=set/append=] |item|
      to |set|.
1. Return |set|.

</div>

<div algorithm>
Equality for [=ordered sets=] is equality of its members, but without
regard to order:
[=Ordered sets=] |A| and |B| are <dfn for=set>equal</dfn> if both |A| is a
[=superset=] of |B| and |B| is a [=superset=] of |A|.

</div>

## Defaults ## {#sanitization-defaults}

Note: The defaults should follow a certain form, which is checked for at the
      beginning of [=canonicalize a configuration=].

The <dfn>built-in default config</dfn> is as follows:
```
{
  elements: [....],
  attributes: [....],
  comments: true,
}
```

The <dfn>known elements</dfn> are as follows:
```
[
  { name: "div", namespace: "http://www.w3.org/1999/xhtml" },
  ...
]
```

The <dfn>known attributes</dfn> are as follows:
```
[
  { name: "class", namespace: null },
  ...
]
```

Note: The [=known elements=] and [=known attributes=] should be derived from the
      HTML5 specification, rather than being explicitly listed here. Currently,
      there are no mechanics to do so.

<div>
The <dfn>navigating URL attributes list</dfn>, for which "`javascript:`"
navigations are unsafe, are as follows:

&laquo;[
  <br>
  [
    { "`name`" &rightarrow; "`a`", "`namespace`" &rightarrow; "[=HTML namespace=]" },
    { "`name`" &rightarrow; "`href`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`area`", "`namespace`" &rightarrow; "[=HTML namespace=]" },
    { "`name`" &rightarrow; "`href`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`form`", "`namespace`" &rightarrow; "[=HTML namespace=]" },
    { "`name`" &rightarrow; "`action`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`input`", "`namespace`" &rightarrow; "[=HTML namespace=]" },
    { "`name`" &rightarrow; "`formaction`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`button`", "`namespace`" &rightarrow; "[=HTML namespace=]" },
    { "`name`" &rightarrow; "`formaction`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
]&raquo;
</div>


# Security Considerations # {#security-considerations}

The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting
by traversing a supplied HTML content and removing elements and attributes
according to a configuration. The specified API must not support
the construction of a Sanitizer object that leaves script-capable markup in
and doing so would be a bug in the threat model.

That being said, there are security issues which the correct usage of the
Sanitizer API will not be able to protect against and the scenarios will be
laid out in the following sections.

## Server-Side Reflected and Stored XSS ## {#server-side-xss}

<em>This section is not normative.</em>

The Sanitizer API operates solely in the DOM and adds a capability to traverse
and filter an existing DocumentFragment. The Sanitizer does not address
server-side reflected or stored XSS.

## DOM clobbering ## {#dom-clobbering}

<em>This section is not normative.</em>

DOM clobbering describes an attack in which malicious HTML confuses an
application by naming elements through `id` or `name` attributes such that
properties like `children` of an HTML element in the DOM are overshadowed by
the malicious content.

The Sanitizer API does not protect DOM clobbering attacks in its
default state, but can be configured to remove `id` and `name` attributes.

## XSS with Script gadgets ## {#script-gadgets}

<em>This section is not normative.</em>

Script gadgets are a technique in which an attacker uses existing application
code from popular JavaScript libraries to cause their own code to execute.
This is often done by injecting innocent-looking code or seemingly inert
DOM nodes that is only parsed and interpreted by a framework which then
performs the execution of JavaScript based on that input.

The Sanitizer API can not prevent these attacks, but requires page authors to
explicitly allow unknown elements in general, and authors must additionally
explicitly configure unknown attributes and elements and markup that is known
to be widely used for templating and framework-specific code,
like `data-` and `slot` attributes and elements like `<slot>` and `<template>`.
We believe that these restrictions are not exhaustive and encourage page
authors to examine their third party libraries for this behavior.

## Mutated XSS ## {#mutated-xss}

<em>This section is not normative.</em>

Mutated XSS or mXSS describes an attack based on parser context mismatches
when parsing an HTML snippet without the correct context. In particular,
when a parsed HTML fragment has been serialized to a string, the string is
not guaranteed to be parsed and interpreted exactly the same when inserted
into a different parent element. An example for carrying out such an attack
is by relying on the change of parsing behavior for foreign content or
mis-nested tags.

The Sanitizer API offers only functions that turn a string into a node tree.
The context is supplied implicitly by all sanitizer functions:
`Element.setHTML()` uses the current element; `Document.parseHTML()` creates a
new document. Therefore Sanitizer API is not directly affected by mutated XSS.

If a developer were to retrieve a sanitized node tree as a string, e.g. via
`.innerHTML`, and to then parse it again then mutated XSS may occur.
We discourage this practice. If processing or passing of HTML as a
string should be necessary after all, then any string should be considered
untrusted and should be sanitized (again) when inserting it into the DOM. In
other words, a sanitized and then serialized HTML tree can no
longer be considered as sanitized.

A more complete treatment of mXSS can be found in [[MXSS]].

# Acknowledgements # {#ack}

Cure53's [[DOMPURIFY]] is a clear inspiration for the API this document
describes, as is Internet Explorer's {{window.toStaticHTML()}}.