-
Notifications
You must be signed in to change notification settings - Fork 31
/
index.bs
882 lines (731 loc) · 36.7 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
<pre class="metadata">
Title: HTML Sanitizer API
Status: CG-DRAFT
Group: WICG
URL: https://wicg.github.io/sanitizer-api/
Repository: WICG/sanitizer-api
Shortname: sanitizer-api
Level: 1
Editor: Frederik Braun 68466, Mozilla, [email protected], https://frederik-braun.com
Editor: Mario Heiderich, Cure53, [email protected], https://cure53.de
Editor: Daniel Vogelheim, Google LLC, [email protected], https://www.google.com
Abstract:
This document specifies a set of APIs which allow developers to take
untrusted HTML input and sanitize it for safe insertion into a document's
DOM.
Indent: 2
Work Status: exploring
Boilerplate: omit conformance
Markup Shorthands: css off, markdown on
</pre>
<pre class="link-defaults">
spec:html; type:attribute; text: innerHTML
spec:dom; type:method; text: createDocumentFragment
spec:html; type:dfn; text: template contents
</pre>
<pre class="anchors">
text: window.toStaticHTML(); type: method; url: https://msdn.microsoft.com/en-us/library/cc848922(v=vs.85).aspx
text: internal slot; type:dfn; url: https://tc39.es/ecma262/#sec-ordinary-object-internal-methods-and-internal-slots
text: parse HTML from a string; type: dfn; url: https://html.spec.whatwg.org/#parse-html-from-a-string
</pre>
<pre class="biblio">
{
"DOMPURIFY": {
"href": "https://github.com/cure53/DOMPurify",
"title": "DOMPurify",
"publisher": "Cure53"
},
"MXSS": {
"href": "https://cure53.de/fp170.pdf",
"title": "mXSS Attacks: Attacking well-secured Web-Applications by using innerHTML Mutations",
"publisher": "Ruhr-Universität Bochum"
}
}
</pre>
# Introduction # {#intro}
<em>This section is not normative.</em>
Web applications often need to work with strings of HTML on the client side,
perhaps as part of a client-side templating solution, perhaps as part of
rendering user generated content, etc. It is difficult to do so in a safe way.
The naive approach of joining strings together and stuffing them into
an {{Element}}'s {{Element/innerHTML}} is fraught with risk, as it can cause
JavaScript execution in a number of unexpected ways.
Libraries like [[DOMPURIFY]] attempt to manage this problem by carefully
parsing and sanitizing strings before insertion, by constructing a DOM and
filtering its members through an allow-list. This has proven to be a fragile
approach, as the parsing APIs exposed to the web don't always map in
reasonable ways to the browser's behavior when actually rendering a string as
HTML in the "real" DOM. Moreover, the libraries need to keep on top of
browsers' changing behavior over time; things that once were safe may turn
into time-bombs based on new platform-level features.
The browser has a fairly good idea of when it is going to
execute code. We can improve upon the user-space libraries by teaching the
browser how to render HTML from an arbitrary string in a safe manner, and do
so in a way that is much more likely to be maintained and updated along with
the browser's own changing parser implementation. This document outlines an
API which aims to do just that.
## Goals ## {#goals}
* Mitigate the risk of DOM-based cross-site scripting attacks by providing
developers with mechanisms for handling user-controlled HTML which prevent
direct script execution upon injection.
* Make HTML output safe for use within the current user agent, taking into
account its current understanding of HTML.
* Allow developers to override the default set of elements and attributes.
Adding certain elements and attributes can prevent
<a href="https://github.com/google/security-research-pocs/tree/master/script-gadgets">script gadget</a>
attacks.
## API Summary ## {#api-summary}
The Sanitizer API offers functionality to parse a string containing HTML into
a DOM tree, and to filter the resulting tree according to a user-supplied
configuration. The methods come in two by two flavours:
* Safe and unsafe: The "safe" methods will not generate any markup that executes
script. That is, they should be safe from XSS. The "unsafe" methods will parse
and filter whatever they're supposed to.
* Context: Methods are defined on {{Element}} and {{ShadowRoot}} and will
replace these {{Node}}'s children, and are largely analogous to {{Element/innerHTML}}.
There are also static methods on the {{Document}}, which parse an entire
document are largely analogous to {{DOMParser}}.{{parseFromString()}}.
# Framework # {#framework}
## Sanitizer API ## {#sanitizer-api}
The {{Element}} interface defines two methods, {{Element/setHTML()}} and
{{Element/setHTMLUnsafe()}}. Both of these take a {{DOMString}} with HTML
markup, and an optional configuration.
<pre class="idl extract">
partial interface Element {
[CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLOptions options = {});
[CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>
<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:
1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
{{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Element setHTMLUnsafe", and "script".
1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
{{HTMLTemplateElement|template}} element; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |compliantHTML|, |options|, and false.
</div>
<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTML</dfn>(|html|, |options|) method steps are:
1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
{{HTMLTemplateElement|template}}; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |html|, |options|, and true.
</div>
<pre class="idl extract">
partial interface ShadowRoot {
[CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLOptions options = {});
[CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>
These methods are mirrored on the {{ShadowRoot}}:
<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:
1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
{{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "ShadowRoot setHTMLUnsafe", and "script".
1. [=Set and filter HTML=] using [=this=],
[=this=]'s [=shadow host=] (as context element),
|compliantHTML|, |options|, and false.
</div>
<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTML</dfn>(|html|, |options|)</dfn> method steps are:
1. [=Set and filter HTML=] using [=this=] (as target), [=this=] (as context element),
|html|, |options|, and true.
</div>
The {{Document}} interface gains two new methods which parse an entire {{Document}}:
<pre class="idl extract">
partial interface Document {
static Document parseHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLOptions options = {});
static Document parseHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>
<div algorithm>
The <dfn for="Document" export>parseHTMLUnsafe</dfn>(|html|, |options|) method steps are:
1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
{{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Document parseHTMLUnsafe", and "script".
1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".
Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |compliantHTML|.
1. Let |config| be the result of calling [=get a sanitizer config from options=]
with |options| and false.
1. If |config| is not [=list/empty=],
then call [=sanitize=] on |document|'s [=tree/root|root node=] with |config|.
1. Return |document|.
</div>
<div algorithm>
The <dfn for="Document" export>parseHTML</dfn>(|html|, |options|) method steps are:
1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".
Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |html|.
1. Let |config| be the result of calling [=get a sanitizer config from options=]
with |options| and true.
1. Call [=sanitize=] on |document|'s [=tree/root|root node=] with |config|.
1. Return |document|.
</div>
## SetHTML options and the configuration object. ## {#configobject}
The family of {{Element/setHTML()}}-like methods all accept an options
dictionary. Right now, only one member of this dictionary is defined:
<pre class=idl>
dictionary SetHTMLOptions {
(Sanitizer or SanitizerConfig) sanitizer = {};
};
</pre>
The {{Sanitizer}} configuration object encapsulates a filter configuration.
The same config can be used with both safe or unsafe methods. The intent is
that one (or a few) configurations will be built-up early on in a page's
lifetime, and can then be used whenever needed. This allows implementations
to pre-process configurations.
The configuration object is also query-able and can return
[=SanitizerConfig/canonical=] configuration dictionaries,
in both safe and unsafe variants. This allows a
page to query and predict what effect a given configuration will have, or
to build a new configuration based on an existing one.
<pre class=idl>
[Exposed=(Window,Worker)]
interface Sanitizer {
constructor(optional SanitizerConfig config = {});
SanitizerConfig get();
SanitizerConfig getUnsafe();
};
</pre>
<div algorithm>
The <dfn for="Sanitizer" export>constructor</dfn>(|config|)
method steps are:
1. Store |config| in [=this=]'s [=internal slot=].
</div>
<div algorithm>
The <dfn for="Sanitizer" export>get</dfn>() method steps are:
1. Return the result of [=canonicalize a configuration=] with the value of
[=this=]'s [=internal slot=] and true.
</div>
<div algorithm>
The <dfn for="Sanitizer" export>getUnsafe</dfn>() method steps are:
1. Return the result of [=canonicalize a configuration=] with the value of
[=this=]'s [=internal slot=] and false.
</div>
## The Configuration Dictionary ## {#config}
<pre class=idl>
dictionary SanitizerElementNamespace {
required DOMString name;
DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};
// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
sequence<SanitizerAttribute> attributes;
sequence<SanitizerAttribute> removeAttributes;
};
typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;
dictionary SanitizerAttributeNamespace {
required DOMString name;
DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;
dictionary SanitizerConfig {
sequence<SanitizerElementWithAttributes> elements;
sequence<SanitizerElement> removeElements;
sequence<SanitizerElement> replaceWithChildrenElements;
sequence<SanitizerAttribute> attributes;
sequence<SanitizerAttribute> removeAttributes;
boolean comments;
boolean dataAttributes;
};
</pre>
# Algorithms # {#algorithms}
<div algorithm>
To <dfn>set and filter HTML</dfn>, given an {{Element}} or {{DocumentFragment}}
|target|, an {{Element}} |contextElement|, a [=string=] |html|, and a
[=dictionary=] |options|, and a [=boolean=] |safe|:
1. If |safe| and |contextElement|'s [=Element/local name=] is "`script`" and
|contextElement|'s [=Element/namespace=] is the [=HTML namespace=] or the
[=SVG namespace=], then return.
1. Let |config| be the result of calling [=get a sanitizer config from options=]
with |options| and |safe|.
1. Let |newChildren| be the result of the HTML [=fragment parsing algorithm steps=]
given |contextElement|, |html|, and true.
1. Let |fragment| be a new {{DocumentFragment}} whose [=node document=] is |contextElement|'s [=node document=].
1. [=list/iterate|For each=] |node| in |newChildren|, [=list/append=] |node| to |fragment|.
1. If |config| is not [=list/empty=], then run [=sanitize=] on |fragment| using |config|.
1. [=Replace all=] with |fragment| within |target|.
</div>
<div algorithm>
To <dfn for="SanitizerConfig">get a sanitizer config from options</dfn> for
an options dictionary |options| and a boolean |safe|, do:
1. Assert: |options| is a [=dictionary=].
1. If |options|["`sanitizer`"] doesn't [=map/exist=], then return undefined.
1. Assert: |options|["`sanitizer`"] is either a {{Sanitizer}} instance
or a [=dictionary=].
1. If |options|["`sanitizer`"] is a {{Sanitizer}} instance:
1. Then let |config| be the value of |options|["`sanitizer`"]'s [=internal slot=].
1. Otherwise let |config| be the value of |options|["`sanitizer`"].
1. Return the result of calling [=canonicalize a configuration=] on
|config| and |safe|.
</div>
## Sanitization Algorithms ## {#sanitization}
<div algorithm="sanitize">
For the main <dfn>sanitize</dfn> operation, using a {{ParentNode}} |node|, a
[=SanitizerConfig/canonical=] {{SanitizerConfig}} |config|, run these steps:
1. [=Assert=]: |config| is [=SanitizerConfig/canonical=].
1. Let |current| be |node|.
1. [=list/iterate|For each=] |child| in |current|'s [=tree/children=]:
1. [=Assert=]: |child| [=implements=] {{Text}}, {{Comment}}, or {{Element}}.
Note: Currently, this algorithm is only called on output of the HTML
parser for which this assertion should hold. If in the future
this algorithm will be used in different contexts, this assumption
needs to be re-examined.
1. If |child| [=implements=] {{Text}}:
1. [=continue=].
1. else if |child| [=implements=] {{Comment}}:
1. If |config|'s {{SanitizerConfig/comments}} is not true:
1. [=/remove=] |child|.
1. else:
1. Let |elementName| be a {{SanitizerElementNamespace}} with |child|'s
[=Element/local name=] and [=Element/namespace=].
1. If |config|["{{SanitizerConfig/elements}}"] exists and
|config|["{{SanitizerConfig/elements}}"] does not [=SanitizerConfig/contain=]
[|elementName|]:
1. [=/remove=] |child|.
1. else if |config|["{{SanitizerConfig/removeElements}}"] exists and
|config|["{{SanitizerConfig/removeElements}}"] [=SanitizerConfig/contains=]
[|elementName|]:
1. [=/remove=] |child|.
1. If |config|["{{SanitizerConfig/replaceWithChildrenElements}}"] exists and |config|["{{SanitizerConfig/replaceWithChildrenElements}}"] [=SanitizerConfig/contains=] |elementName|:
1. Call [=sanitize=] on |child| with |config|.
1. Call [=replace all=] with |child|'s [=tree/children=] within |child|.
1. If |elementName| [=equals=] «[ "`name`" → "`template`",
"`namespace`" → [=HTML namespace=] ]»
1. Then call [=sanitize=] on |child|'s [=template contents=] with |config|.
1. If |child| is a [=shadow host=]:
1. Then call [=sanitize=] on |child|'s [=Element/shadow root=] with |config|.
1. [=list/iterate|For each=] |attr| in |current|'s [=Element/attribute list=]:
1. Let |attrName| be a {{SanitizerAttributeNamespace}} with |attr|'s
[=Attr/local name=] and [=Attr/namespace=].
1. If |config|["{{SanitizerConfig/attributes}}"] exists and
|config|["{{SanitizerConfig/attributes}}"] does not [=SanitizerConfig/contain=]
|attrName|:
1. If "data-" is a [=code unit prefix=] of [=Attr/local name=] and
if [=Attr/namespace=] is `null` and
if |config|["{{SanitizerConfig/dataAttributes}}"] exists and is false:
1. Remove |attr| from |child|.
1. else if |config|["{{SanitizerConfig/removeAttributes}}"] exists and
|config|["{{SanitizerConfig/removeAttributes}}"] [=SanitizerConfig/contains=]
|attrName|:
1. Remove |attr| from |child|.
1. If |config|["{{SanitizerConfig/elements}}"][|elementName|] exists,
and if
|config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
exists, and if
|config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
does not [=SanitizerConfig/contain=] |attrName|:
1. Remove |attr| from |child|.
1. If |config|["{{SanitizerConfig/elements}}"][|elementName|] exists,
and if
|config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
exists, and if
|config|["{{SanitizerConfig/elements}}"][|elementName|]["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
[=SanitizerConfig/contains=] |attrName|:
1. Remove |attr| from |child|.
1. If «[|elementName|, |attrName|]» matches an entry in the
[=navigating URL attributes list=], and if |attr|'s [=protocol=] is
"`javascript:`":
1. Then remove |attr| from |child|.
1. Call [=sanitize=] on |child|'s [=Element/shadow root=] with |config|.
1. else:
1. [=/remove=] |child|.
</div>
## Configuration Processing ## {#configuration-processing}
<div algorithm>
A |config| is <dfn for="SanitizerConfig">valid</dfn> if all these conditions are met:
1. |config| is a [=dictionary=]
1. |config|'s [=map/keys|key set=] does not [=list/contain=] both
"{{SanitizerConfig/elements}}" and "{{SanitizerConfig/removeElements}}"
1. |config|'s [=map/keys|key set=] does not [=list/contain=] both
"{{SanitizerConfig/removeAttributes}}" and "{{SanitizerConfig/attributes}}".
1. [=list/iterate|For any=] |key| of «[
"{{SanitizerConfig/elements}}",
"{{SanitizerConfig/removeElements}}",
"{{SanitizerConfig/replaceWithChildrenElements}}",
"{{SanitizerConfig/attributes}}",
"{{SanitizerConfig/removeAttributes}}"
]» where |config|[|key|] [=map/exists=]:
1. |config|[|key|] is [=SanitizerNameList/valid=].
1. If |config|["{{SanitizerConfig/elements}}"] exists, then
[=list/iterate|for any=] |element| in |config|[|key|] that is a [=dictionary=]:
1. |element| does not [=list/contain=] both
"{{SanitizerElementNamespaceWithAttributes/attributes}}" and
"{{SanitizerElementNamespaceWithAttributes/removeAttributes}}".
1. If either |element|["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
or |element|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
[=map/exists=], then it is [=SanitizerNameList/valid=].
1. Let |tmp| be a [=dictionary=], and for any |key| «[
"{{SanitizerConfig/elements}}",
"{{SanitizerConfig/removeElements}}",
"{{SanitizerConfig/replaceWithChildrenElements}}",
"{{SanitizerConfig/attributes}}",
"{{SanitizerConfig/removeAttributes}}"
]» |tmp|[|key|] is set to the result of [=canonicalize a sanitizer
element list=] called on |config|[|key|], and [=HTML namespace=] as default
namespace for the element lists, and `null` as default namespace for the
attributes lists.
Note: The intent here is to assert about list elements, but without regard
to whether the string shortcut syntax or the explicit dictionary
syntax is used. For example, having "img" in `elements` and
`{ name: "img" }` in `removeElements`. An implementation might well
do this without explicitly canonicalizing the lists at this point.
1. Given theses canonicalized name lists, all of the following conditions hold:
1. The [=set/intersection=] between
|tmp|["{{SanitizerConfig/elements}}"] and
|tmp|["{{SanitizerConfig/removeElements}}"]
is [=set/empty=].
1. The [=set/intersection=] between
|tmp|["{{SanitizerConfig/removeElements}}"]
|tmp|["{{SanitizerConfig/replaceWithChildrenElements}}"]
is [=set/empty=].
1. The [=set/intersection=] between
|tmp|["{{SanitizerConfig/replaceWithChildrenElements}}"] and
|tmp|["{{SanitizerConfig/elements}}"]
is [=set/empty=].
1. The [=set/intersection=] between
|tmp|["{{SanitizerConfig/attributes}}"] and
|tmp|["{{SanitizerConfig/removeAttributes}}"]
is [=set/empty=].
1. Let |tmpattrs| be |tmp|["{{SanitizerConfig/attributes}}"] if it exists,
and otherwise [=built-in default config=]["{{SanitizerConfig/attributes}}"].
1. [=list/iterate|For any=] |item| in |tmp|["{{SanitizerConfig/elements}}"]:
1. If either |item|["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
or |item|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
exists:
1. Then the [=set/difference=] between it and |tmpattrs| is [=set/empty=].
</div>
<div algorithm>
A |list| of names is <dfn for="SanitizerNameList">valid</dfn> if all these
conditions are met:
1. |list| is a [=/list=].
1. [=list/iterate|For all=] of its members |name|:
1. |name| is a {{string}} or a [=dictionary=].
1. If |name| is a [=dictionary=]:
1. |name|["{{SanitizerElementNamespace/name}}"] [=map/exists=] and is a {{string}}.
</div>
<div algorithm>
A |config| is <dfn for="SanitizerConfig">canonical</dfn> if all these conditions are met:
1. |config| is [=SanitizerConfig/valid=].
1. |config|'s [=map/keys|key set=] is a [=set/subset=] of
«[
"{{SanitizerConfig/elements}}",
"{{SanitizerConfig/removeElements}}",
"{{SanitizerConfig/replaceWithChildrenElements}}",
"{{SanitizerConfig/attributes}}",
"{{SanitizerConfig/removeAttributes}}",
"{{SanitizerConfig/comments}}",
"{{SanitizerConfig/dataAttributes}}"
]»
1. |config|'s [=map/keys|key set=] [=list/contains=] either:
1. both "{{SanitizerConfig/elements}}" and "{{SanitizerConfig/attributes}}",
but neither of
"{{SanitizerConfig/removeElements}}" or "{{SanitizerConfig/removeAttributes}}".
1. or both
"{{SanitizerConfig/removeElements}}" and "{{SanitizerConfig/removeAttributes}}",
but neither of
"{{SanitizerConfig/elements}}" or "{{SanitizerConfig/attributes}}".
1. For any |key| of «[
"{{SanitizerConfig/replaceWithChildrenElements}}",
"{{SanitizerConfig/removeElements}}",
"{{SanitizerConfig/attributes}}",
"{{SanitizerConfig/removeAttributes}}"
]» where |config|[|key|] [=map/exists=]:
1. |config|[|key|] is [=SanitizerNameList/canonical=].
1. If |config|["{{SanitizerConfig/elements}}"] [=map/exists=]:
1. |config|["{{SanitizerConfig/elements}}"] is [=SanitizerNameWithAttributesList/canonical=].
1. For any |key| of «[
"{{SanitizerConfig/comments}}",
"{{SanitizerConfig/dataAttributes}}"
]»:
1. if |config|[|key|] [=map/exists=], |config|[|key|] is a {{boolean}}.
</div>
<div algorithm>
A |list| of names is <dfn for="SanitizerNameList">canonical</dfn> if all these
conditions are met:
1. |list|[|key|] is a [=/list=].
1. [=list/iterate|For all=] of its |list|[|key|]'s members |name|:
1. |name| is a [=dictionary=].
1. |name|'s [=map/keys|key set=] [=set/equals=] «[
"{{SanitizerElementNamespace/name}}", "{{SanitizerElementNamespace/namespace}}"
]»
1. |name|'s [=map/values=] are [=string=]s.
</div>
<div algorithm>
A |list| of names is <dfn for="SanitizerNameWithAttributesList">canonical</dfn>
if all these conditions are met:
1. |list|[|key|] is a [=/list=].
1. [=list/iterate|For all=] of its |list|[|key|]'s members |name|:
1. |name| is a [=dictionary=].
1. |name|'s [=map/keys|key set=] [=set/equals=] one of:
1. «[
"{{SanitizerElementNamespace/name}}",
"{{SanitizerElementNamespace/namespace}}"
]»
1. «[
"{{SanitizerElementNamespace/name}}",
"{{SanitizerElementNamespace/namespace}}",
"{{SanitizerElementNamespaceWithAttributes/attributes}}"
]»
1. «[
"{{SanitizerElementNamespace/name}}",
"{{SanitizerElementNamespace/namespace}}",
"{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"
]»
1. |name|["{{SanitizerElementNamespace/name}}"] and
|name|["{{SanitizerElementNamespace/namespace}}"] are [=string=]s.
1. |name|["{{SanitizerElementNamespaceWithAttributes/attributes}}"] and
|name|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
are [=SanitizerNameList/canonical=] if they [=map/exist=].
</div>
<div algorithm>
To <dfn>canonicalize a configuration</dfn> |config| with a [=boolean=] |safe|:
Note: The initial set of [=assert=]s assert properties of the built-in
constants, like the [=built-in default config|defaults=] and
the lists of known [=known elements|elements=] and
[=known attributes|attributes=].
1. [=Assert=]: [=built-in default config=] is [=SanitizerConfig/canonical=].
1. [=Assert=]: [=built-in default config=]["elements"] is a [=subset=] of [=known elements=].
1. [=Assert=]: [=built-in default config=]["attributes"] is a [=subset=] of [=known attributes=].
1. [=Assert=]: «[
"elements" → [=known elements=],
"attributes" → [=known attributes=],
]» is [=SanitizerConfig/canonical=].
1. If |config| is [=list/empty=] and not |safe|, then return «[]»
1. If |config| is not [=SanitizerConfig/valid=], then [=throw=] a {{TypeError}}.
1. Let |result| be a new [=dictionary=].
1. For each |key| of «[
"{{SanitizerConfig/elements}}",
"{{SanitizerConfig/removeElements}}",
"{{SanitizerConfig/replaceWithChildrenElements}}" ]»:
1. If |config|[|key|] exists, set |result|[|key|] to the result of running
[=canonicalize a sanitizer element list=] on |config|[|key|] with
[=HTML namespace=] as the default namespace.
1. For each |key| of «[
"{{SanitizerConfig/attributes}}",
"{{SanitizerConfig/removeAttributes}}" ]»:
1. If |config|[|key|] exists, set |result|[|key|] to the result of running
[=canonicalize a sanitizer element list=] on |config|[|key|] with `null` as
the default namespace.
1. Set |result|["{{SanitizerConfig/comments}}"] to
|config|["{{SanitizerConfig/comments}}"].
1. Let |default| be the result of [=canonicalizing a configuration=] for the
[=built-in default config=].
1. If |safe|:
1. If |config|["{{SanitizerConfig/elements}}"] [=map/exists=]:
1. Let |elementBlockList| be the [=set/difference=] between
[=known elements=] |default|["{{SanitizerConfig/elements}}"].
Note: The "natural" way to enforce the default element list would be
to intersect with it. But that would also eliminate any unknown
(i.e., non-HTML supplied element, like <foo>). So we
construct this helper to be able to use it to subtract any "unsafe"
elements.
1. Set |result|["{{SanitizerConfig/elements}}"] to the
[=set/difference=] of |result|["{{SanitizerConfig/elements}}"] and
|elementBlockList|.
1. If |config|["{{SanitizerConfig/removeElements}}"] [=map/exists=]:
1. Set |result|["{{SanitizerConfig/elements}}"] to the
[=set/difference=] of |default|["{{SanitizerConfig/elements}}"]
and |result|["{{SanitizerConfig/removeElements}}"].
1. [=set/Remove=] "{{SanitizerConfig/removeElements}}" from |result|.
1. If neither |config|["{{SanitizerConfig/elements}}"] nor
|config|["{{SanitizerConfig/removeElements}}"] [=map/exist=]:
1. Set |result|["{{SanitizerConfig/elements}}"] to
|default|["{{SanitizerConfig/elements}}"].
1. If |config|["{{SanitizerConfig/attributes}}"] [=map/exists=]:
1. Let |attributeBlockList| be the [=set/difference=] between
[=known attributes=] and |default|["{{SanitizerConfig/attributes}}"];
1. Set |result|["{{SanitizerConfig/attributes}}"] to the
[=set/difference=] of |result|["{{SanitizerConfig/attributes}}"] and
|attributeBlockList|.
1. If |config|["{{SanitizerConfig/removeAttributes}}"] [=map/exists=]:
1. Set |result|["{{SanitizerConfig/attributes}}"] to the
[=set/difference=] of |default|["{{SanitizerConfig/attributes}}"]
and |result|["{{SanitizerConfig/removeAttributes}}"].
1. [=set/Remove=] "{{SanitizerConfig/removeAttributes}}" from |result|.
1. If neither |config|["{{SanitizerConfig/attributes}}"] nor
|config|["{{SanitizerConfig/removeAttributes}}"] [=map/exist=]:
1. Set |result|["{{SanitizerConfig/attributes}}"] to
|default|["{{SanitizerConfig/attributes}}"].
1. Else (if not |safe|):
1. If neither |config|["{{SanitizerConfig/elements}}"] nor
|config|["{{SanitizerConfig/removeElements}}"] [=map/exist=]:
1. Set |result|["{{SanitizerConfig/elements}}"] to
|default|["{{SanitizerConfig/elements}}"].
1. If neither |config|["{{SanitizerConfig/attributes}}"] nor
|config|["{{SanitizerConfig/removeAttributes}}"] [=map/exist=]:
1. Set |result|["{{SanitizerConfig/attributes}}"] to
|default|["{{SanitizerConfig/attributes}}"].
1. [=Assert=]: |result| is [=SanitizerConfig/valid=].
1. [=Assert=]: |result| is [=SanitizerConfig/canonical=].
1. Return |result|.
</div>
<div algorithm>
In order to <dfn>canonicalize a sanitizer element list</dfn> |list|, with a
default namespace |defaultNamespace|, run the following steps:
1. Let |result| be a new [=ordered set=].
2. [=list/iterate|For each=] |name| in |list|, call
[=canonicalize a sanitizer name=] on |name| with |defaultNamespace| and
[=set/append=] to |result|.
3. Return |result|.
</div>
<div algorithm>
In order to <dfn>canonicalize a sanitizer name</dfn> |name|, with a default
namespace |defaultNamespace|, run the following steps:
1. [=Assert=]: |name| is either a {{DOMString}} or a [=dictionary=].
1. If |name| is a {{DOMString}}, then return «[ "`name`" → |name|, "`namespace`" → |defaultNamespace|]».
1. [=Assert=]: |name| is a [=dictionary=] and |name|["name"] [=map/exists=].
1. Return «[ <br>
"`name`" → |name|["name"], <br>
"`namespace`" → |name|["namespace"] if it [=map/exists=], otherwise |defaultNamespace| <br>
]».
</div>
## Supporting Algorithms ## {#alg-support}
<div algorithm>
For the [=canonicalize a sanitizer name|canonicalized=]
{{SanitizerElementNamespace|element}} and {{SanitizerAttributeNamespace|attribute name}} lists
used in this spec, list membership is based on matching both "`name`" and "`namespace`"
entries:
A Sanitizer name |list| <dfn for="SanitizerConfig">contains</dfn> an |item|
if there exists an |entry| of |list| that is an [=ordered map=], and where
|item|["name"] [=equals=] |entry|["name"] and
|item|["namespace"] [=equals=] |entry|["namespace"].
</div>
<div algorithm>
Set difference (or set subtraction) is a clone of a set A, but with all members
removed that occur in a set B:
To compute the <dfn for="set">difference</dfn> of two [=ordered sets=] |A| and |B|:
1. Let |set| be a new [=ordered set=].
1. [=list/iterate|For each=] |item| of |A|:
1. If |B| does not [=set/contain=] |item|, then [=set/append=] |item|
to |set|.
1. Return |set|.
</div>
<div algorithm>
Equality for [=ordered sets=] is equality of its members, but without
regard to order:
[=Ordered sets=] |A| and |B| are <dfn for=set>equal</dfn> if both |A| is a
[=superset=] of |B| and |B| is a [=superset=] of |A|.
</div>
## Defaults ## {#sanitization-defaults}
Note: The defaults should follow a certain form, which is checked for at the
beginning of [=canonicalize a configuration=].
The <dfn>built-in default config</dfn> is as follows:
```
{
elements: [....],
attributes: [....],
comments: true,
}
```
The <dfn>known elements</dfn> are as follows:
```
[
{ name: "div", namespace: "http://www.w3.org/1999/xhtml" },
...
]
```
The <dfn>known attributes</dfn> are as follows:
```
[
{ name: "class", namespace: null },
...
]
```
Note: The [=known elements=] and [=known attributes=] should be derived from the
HTML5 specification, rather than being explicitly listed here. Currently,
there are no mechanics to do so.
<div>
The <dfn>navigating URL attributes list</dfn>, for which "`javascript:`"
navigations are unsafe, are as follows:
«[
<br>
[
{ "`name`" → "`a`", "`namespace`" → "[=HTML namespace=]" },
{ "`name`" → "`href`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`area`", "`namespace`" → "[=HTML namespace=]" },
{ "`name`" → "`href`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`form`", "`namespace`" → "[=HTML namespace=]" },
{ "`name`" → "`action`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`input`", "`namespace`" → "[=HTML namespace=]" },
{ "`name`" → "`formaction`", "`namespace`" → `null` }
],
<br>
[
{ "`name`" → "`button`", "`namespace`" → "[=HTML namespace=]" },
{ "`name`" → "`formaction`", "`namespace`" → `null` }
],
<br>
]»
</div>
# Security Considerations # {#security-considerations}
The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting
by traversing a supplied HTML content and removing elements and attributes
according to a configuration. The specified API must not support
the construction of a Sanitizer object that leaves script-capable markup in
and doing so would be a bug in the threat model.
That being said, there are security issues which the correct usage of the
Sanitizer API will not be able to protect against and the scenarios will be
laid out in the following sections.
## Server-Side Reflected and Stored XSS ## {#server-side-xss}
<em>This section is not normative.</em>
The Sanitizer API operates solely in the DOM and adds a capability to traverse
and filter an existing DocumentFragment. The Sanitizer does not address
server-side reflected or stored XSS.
## DOM clobbering ## {#dom-clobbering}
<em>This section is not normative.</em>
DOM clobbering describes an attack in which malicious HTML confuses an
application by naming elements through `id` or `name` attributes such that
properties like `children` of an HTML element in the DOM are overshadowed by
the malicious content.
The Sanitizer API does not protect DOM clobbering attacks in its
default state, but can be configured to remove `id` and `name` attributes.
## XSS with Script gadgets ## {#script-gadgets}
<em>This section is not normative.</em>
Script gadgets are a technique in which an attacker uses existing application
code from popular JavaScript libraries to cause their own code to execute.
This is often done by injecting innocent-looking code or seemingly inert
DOM nodes that is only parsed and interpreted by a framework which then
performs the execution of JavaScript based on that input.
The Sanitizer API can not prevent these attacks, but requires page authors to
explicitly allow unknown elements in general, and authors must additionally
explicitly configure unknown attributes and elements and markup that is known
to be widely used for templating and framework-specific code,
like `data-` and `slot` attributes and elements like `<slot>` and `<template>`.
We believe that these restrictions are not exhaustive and encourage page
authors to examine their third party libraries for this behavior.
## Mutated XSS ## {#mutated-xss}
<em>This section is not normative.</em>
Mutated XSS or mXSS describes an attack based on parser context mismatches
when parsing an HTML snippet without the correct context. In particular,
when a parsed HTML fragment has been serialized to a string, the string is
not guaranteed to be parsed and interpreted exactly the same when inserted
into a different parent element. An example for carrying out such an attack
is by relying on the change of parsing behavior for foreign content or
mis-nested tags.
The Sanitizer API offers only functions that turn a string into a node tree.
The context is supplied implicitly by all sanitizer functions:
`Element.setHTML()` uses the current element; `Document.parseHTML()` creates a
new document. Therefore Sanitizer API is not directly affected by mutated XSS.
If a developer were to retrieve a sanitized node tree as a string, e.g. via
`.innerHTML`, and to then parse it again then mutated XSS may occur.
We discourage this practice. If processing or passing of HTML as a
string should be necessary after all, then any string should be considered
untrusted and should be sanitized (again) when inserting it into the DOM. In
other words, a sanitized and then serialized HTML tree can no
longer be considered as sanitized.
A more complete treatment of mXSS can be found in [[MXSS]].
# Acknowledgements # {#ack}
Cure53's [[DOMPURIFY]] is a clear inspiration for the API this document
describes, as is Internet Explorer's {{window.toStaticHTML()}}.