diff --git a/review-drafts/2024-03.bs b/review-drafts/2024-03.bs new file mode 100644 index 0000000..6eaadeb --- /dev/null +++ b/review-drafts/2024-03.bs @@ -0,0 +1,2104 @@ +
+Group: WHATWG
+Status: RD
+Date: 2024-03-18
+H1: URL Pattern
+Shortname: urlpattern
+Text Macro: TWITTER urlpatterns
+Abstract: The URL Pattern Standard provides a web platform primitive for matching URLs based on a convenient pattern syntax.
+Indent: 2
+Markup Shorthands: markdown yes
+
+ + + +
+spec: ECMASCRIPT; urlPrefix: https://tc39.es/ecma262/
+  type: dfn
+    text: IdentifierPart; url: #prod-IdentifierPart
+    text: IdentifierStart; url: #prod-IdentifierStart
+spec: URL; urlPrefix: https://url.spec.whatwg.org/
+  type: dfn
+    text: serialize an integer; url: #serialize-an-integer
+
+ +

URL patterns

+ +

Introduction

+ +A [=URL pattern=] consists of several [=components=], each of which represents a [=/pattern string|pattern=] which could be matched against the corresponding component of a [=/URL=]. + +It can be constructed using a string for each component, or from a shorthand string. It can optionally be resolved relative to a base URL. + +
+

The shorthand "`https://example.com/:category/*`" corresponds to the following components: + +

+
[=URL pattern/protocol component|protocol=] +
"`https`" + +
[=URL pattern/username component|username=] +
"`*`" + +
[=URL pattern/password component|password=] +
"`*`" + +
[=URL pattern/hostname component|hostname=] +
"`example.com`" + +
[=URL pattern/port component|port=] +
"" + +
[=URL pattern/pathname component|pathname=] +
"`/:category/*`" + +
[=URL pattern/search component|search=] +
"`*`" + +
[=URL pattern/hash component|hash=] +
"`*`" +
+ + It matches the following URLs: + + + + It does not match the following URLs: + + +
+ +
+

The shorthand "`http{s}?://{:subdomain.}?shop.example/products/:id([0-9]+)#reviews`" corresponds to the following components: + +

+
[=URL pattern/protocol component|protocol=] +
"`http{s}?`" + +
[=URL pattern/username component|username=] +
"`*`" + +
[=URL pattern/password component|password=] +
"`*`" + +
[=URL pattern/hostname component|hostname=] +
"`{:subdomain.}?shop.example`" + +
[=URL pattern/port component|port=] +
"" + +
[=URL pattern/pathname component|pathname=] +
"`/products/:id([0-9]+)`" + +
[=URL pattern/search component|search=] +
"" + +
[=URL pattern/hash component|hash=] +
"`reviews`" +
+ + It matches the following URLs: + + + + It does not match the following URLs: + + +
+ +
+

The shorthand "`../admin/*`" with the base URL "`https://discussion.example/forum/?page=2`" corresponds to the following components: + +

+
[=URL pattern/protocol component|protocol=] +
"`https`" + +
[=URL pattern/username component|username=] +
"`*`" + +
[=URL pattern/password component|password=] +
"`*`" + +
[=URL pattern/hostname component|hostname=] +
"`discussion.example`" + +
[=URL pattern/port component|port=] +
"" + +
[=URL pattern/pathname component|pathname=] +
"`/admin/*`" + +
[=URL pattern/search component|search=] +
"`*`" + +
[=URL pattern/hash component|hash=] +
"`*`" +
+ + It matches the following URLs: + + + + It does not match the following URLs: + + +
+ +

The {{URLPattern}} class

+ + +typedef (USVString or URLPatternInit) URLPatternInput; + +[Exposed=(Window,Worker)] +interface URLPattern { + constructor(URLPatternInput input, USVString baseURL, optional URLPatternOptions options = {}); + constructor(optional URLPatternInput input = {}, optional URLPatternOptions options = {}); + + boolean test(optional URLPatternInput input = {}, optional USVString baseURL); + + URLPatternResult? exec(optional URLPatternInput input = {}, optional USVString baseURL); + + readonly attribute USVString protocol; + readonly attribute USVString username; + readonly attribute USVString password; + readonly attribute USVString hostname; + readonly attribute USVString port; + readonly attribute USVString pathname; + readonly attribute USVString search; + readonly attribute USVString hash; + + readonly attribute boolean hasRegExpGroups; +}; + +dictionary URLPatternInit { + USVString protocol; + USVString username; + USVString password; + USVString hostname; + USVString port; + USVString pathname; + USVString search; + USVString hash; + USVString baseURL; +}; + +dictionary URLPatternOptions { + boolean ignoreCase = false; +}; + +dictionary URLPatternResult { + sequence<URLPatternInput> inputs; + + URLPatternComponentResult protocol; + URLPatternComponentResult username; + URLPatternComponentResult password; + URLPatternComponentResult hostname; + URLPatternComponentResult port; + URLPatternComponentResult pathname; + URLPatternComponentResult search; + URLPatternComponentResult hash; +}; + +dictionary URLPatternComponentResult { + USVString input; + record<USVString, (USVString or undefined)> groups; +}; + + +Each {{URLPattern}} has an associated URL pattern, a [=URL pattern=]. + +
+
|urlPattern| = new {{URLPattern/constructor(input, baseURL, options)|URLPattern}}(|input|)
+
+ Constructs a new {{URLPattern}} object. The |input| is an object containing separate patterns for each URL component; e.g. hostname, pathname, etc. Missing components will default to a wildcard pattern. In addition, |input| can contain a {{URLPatternInit/baseURL}} property that provides static text patterns for any missing components. +
+ +
|urlPattern| = new {{URLPattern/constructor(input, baseURL, options)|URLPattern}}(|patternString|, |baseURL|)
+
+ Constructs a new {{URLPattern}} object. |patternString| is a URL string containing pattern syntax for one or more components. If |baseURL| is provided, then |patternString| can be relative. This constructor will always set at least an empty string value and does not default any components to wildcard patterns. +
+ +
|urlPattern| = new {{URLPattern/constructor(input, baseURL, options)|URLPattern}}(|input|, |options|)
+
+ Constructs a new {{URLPattern}} object. The |options| is an object containing the additional configuration options that can affect how the components are matched. Currently it has only one property {{URLPatternOptions/ignoreCase}} which can be set to true to enable case-insensitive matching. + + Note that by default, that is in the absence of the |options| argument, matching is always case-sensitive. +
+ +
|urlPattern| = new {{URLPattern/constructor(input, baseURL, options)|URLPattern}}(|patternString|, |baseURL|, |options|)
+
+ Constructs a new {{URLPattern}} object. This overrides supports a {{URLPatternOptions}} object when constructing a pattern from a |patternString| object, describing the patterns for individual components, and base URL. +
+ +
|matches| = |urlPattern|.{{URLPattern/test(input, baseURL)|test}}(|input|)
+
+ Tests if |urlPattern| matches the given arguments. The |input| is an object containing strings representing each URL component; e.g. hostname, pathname, etc. Missing components are treated as empty strings. In addition, |input| can contain a {{URLPatternInit/baseURL}} property that provides values for any missing components. If |urlPattern| matches the |input| on a component-by-component basis then true is returned. Otherwise, false is returned. +
+ +
|matches| = |urlPattern|.{{URLPattern/test(input, baseURL)|test}}(|url|, |baseURL|)
+
+ Tests if |urlPattern| matches the given arguments. |url| is a URL string. If |baseURL| is provided, then |url| can be relative. + + If |urlPattern| matches the |input| on a component-by-component basis then true is returned. Otherwise, false is returned. +
+ +
|result| = |urlPattern|.{{URLPattern/exec(input, baseURL)|exec}}(|input|)
+
+ Executes the |urlPattern| against the given arguments. The |input| is an object containing strings representing each URL component; e.g. hostname, pathname, etc. Missing components are treated as empty strings. In addition, |input| can contain a baseURL property that provides values for any missing components. + + If |urlPattern| matches the |input| on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the |result| object; e.g. `matches.pathname.groups.id`. If |urlPattern| does not match the |input|, then |result| is null. +
+ +
|result| = |urlPattern|.{{URLPattern/exec(input, baseURL)|exec}}(|url|, |baseURL|)
+
+ Executes the |urlPattern| against the given arguments. |url| is a URL string. If |baseURL| is provided, then |input| can be relative. + + If |urlPattern| matches the |input| on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the |result| object; e.g. `matches.pathname.groups.id`. If |urlPattern| does not match the |input|, then |result| is null. +
+ +
|urlPattern|.{{URLPattern/protocol}}
+
+

Returns |urlPattern|'s normalized protocol pattern string. +

+ +
|urlPattern|.{{URLPattern/username}}
+
+

Returns |urlPattern|'s normalized username pattern string. +

+ +
|urlPattern|.{{URLPattern/password}}
+
+

Returns |urlPattern|'s normalized password pattern string. +

+ +
|urlPattern|.{{URLPattern/hostname}}
+
+

Returns |urlPattern|'s normalized hostname pattern string. +

+ +
|urlPattern|.{{URLPattern/port}}
+
+

Returns |urlPattern|'s normalized port pattern string. +

+ +
|urlPattern|.{{URLPattern/pathname}}
+
+

Returns |urlPattern|'s normalized pathname pattern string. +

+ +
|urlPattern|.{{URLPattern/search}}
+
+

Returns |urlPattern|'s normalized search pattern string. +

+ +
|urlPattern|.{{URLPattern/hash}}
+
+

Returns |urlPattern|'s normalized hash pattern string. +

+ +
|urlPattern|.{{URLPattern/hasRegExpGroups}}
+
+

Returns whether |urlPattern| contains one or more groups which uses regular expression matching. +

+
+ +
+ The new URLPattern(|input|, |baseURL|, |options|) constructor steps are: + + 1. Run [=initialize=] given [=this=], |input|, |baseURL|, and |options|. +
+ +
+ The new URLPattern(|input|, |options|) constructor steps are: + + 1. Run [=initialize=] given [=this=], |input|, null, and |options|. +
+ +
+ To initialize a {{URLPattern}} given a {{URLPattern}} |this|, {{URLPatternInput}} |input|, string or null |baseURL|, and {{URLPatternOptions}} |options|: + + 1. Set |this|'s [=URLPattern/associated URL pattern=] to the result of [=create=] given |input|, |baseURL|, and |options|. +
+ +
+ The protocol getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/protocol component=]'s [=component/pattern string=]. +
+ +
+ The username getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/username component=]'s [=component/pattern string=]. +
+ +
+ The password getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/password component=]'s [=component/pattern string=]. +
+ +
+ The hostname getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/hostname component=]'s [=component/pattern string=]. +
+ +
+ The port getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/port component=]'s [=component/pattern string=]. +
+ +
+ The pathname getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/pathname component=]'s [=component/pattern string=]. +
+ +
+ The search getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/search component=]'s [=component/pattern string=]. +
+ +
+ The hash getter steps are: + + 1. Return [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/hash component=]'s [=component/pattern string=]. +
+ +
+ The hasRegExpGroups getter steps are: + + 1. If [=this=]'s [=URLPattern/associated URL pattern=]'s [=URL pattern/has regexp groups=], then return true. + 1. Return false. +
+ +
+ The test(|input|, |baseURL|) method steps are: + + 1. Let |result| be the result of [=URL pattern/match=] given [=this=]'s [=URLPattern/associated URL pattern=], |input|, and |baseURL| if given. + 1. If |result| is null, return false. + 1. Return true. +
+ +
+ The exec(|input|, |baseURL|) method steps are: + + 1. Return the result of [=URL pattern/match=] given [=this=]'s [=URLPattern/associated URL pattern=], |input|, and |baseURL| if given. +
+ +

The URL pattern struct

+ +A URL pattern is a [=struct=] with the following [=struct/items=]: + +* protocol component, a [=component=] +* username component, a [=component=] +* password component, a [=component=] +* hostname component, a [=component=] +* port component, a [=component=] +* pathname component, a [=component=] +* search component, a [=component=] +* hash component, a [=component=] + +A component is a [=struct=] with the following [=struct/items=]: + +* pattern string, a [=pattern string/well formed=] [=/pattern string=] +* regular expression, a {{RegExp}} +* group name list, a [=list=] of strings +* has regexp groups, a [=boolean=] + +

High-level operations

+ +
+ To create a [=URL pattern=] given a {{URLPatternInput}} |input|, string or null |baseURL|, and {{URLPatternOptions}} |options|: + + 1. Let |init| be null. + 1. If |input| is a [=scalar value string=] then: + 1. Set |init| to the result of running [=parse a constructor string=] given |input|. + 1. If |baseURL| is null and |init|["{{URLPatternInit/protocol}}"] does not [=map/exist=], then throw a {{TypeError}}. + 1. If |baseURL| is not null, [=map/set=] |init|["{{URLPatternInit/baseURL}}"] to |baseURL|. + 1. Otherwise: + 1. [=Assert=]: |input| is a {{URLPatternInit}}. + 1. If |baseURL| is not null, then throw a {{TypeError}}. + 1. Set |init| to |input|. + 1. Let |processedInit| be the result of [=process a URLPatternInit=] given |init|, "`pattern`", null, null, null, null, null, null, null, and null. + 1. [=list/For each=] |componentName| of « "{{URLPatternInit/protocol}}", "{{URLPatternInit/username}}", "{{URLPatternInit/password}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}", "{{URLPatternInit/pathname}}", "{{URLPatternInit/search}}", "{{URLPatternInit/hash}}" »: + 1. If |processedInit|[|componentName|] does not [=map/exist=], then [=map/set=] |processedInit|[|componentName|] to "`*`". + 1. If |processedInit|["{{URLPatternInit/protocol}}"] is a [=special scheme=] and |processedInit|["{{URLPatternInit/port}}"] is a string which represents its corresponding [=default port=] in radix-10 using [=ASCII digits=] then set |processedInit|["{{URLPatternInit/port}}"] to the empty string. + 1. Let |urlPattern| be a new [=URL pattern=]. + 1. Set |urlPattern|'s [=URL pattern/protocol component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/protocol}}"], [=canonicalize a protocol=], and [=default options=]. + 1. Set |urlPattern|'s [=URL pattern/username component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/username}}"], [=canonicalize a username=], and [=default options=]. + 1. Set |urlPattern|'s [=URL pattern/password component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/password}}"], [=canonicalize a password=], and [=default options=]. + 1. If the result running [=hostname pattern is an IPv6 address=] given |processedInit|["{{URLPatternInit/hostname}}"] is true, then set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize an IPv6 hostname=], and [=hostname options=]. + 1. Otherwise, set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize a hostname=], and [=hostname options=]. + 1. Set |urlPattern|'s [=URL pattern/port component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/port}}"], [=canonicalize a port=], and [=default options=]. + 1. Let |compileOptions| be a copy of the [=default options=] with the [=options/ignore case=] property set to |options|["{{URLPatternOptions/ignoreCase}}"]. + 1. If the result of running [=protocol component matches a special scheme=] given |urlPattern|'s [=URL pattern/protocol component=] is true, then: + 1. Let |pathCompileOptions| be copy of the [=pathname options=] with the [=options/ignore case=] property set to |options|["{{URLPatternOptions/ignoreCase}}"]. + 1. Set |urlPattern|'s [=URL pattern/pathname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/pathname}}"], [=canonicalize a pathname=], and |pathCompileOptions|. + 1. Otherwise set |urlPattern|'s [=URL pattern/pathname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/pathname}}"], [=canonicalize an opaque pathname=], and |compileOptions|. + 1. Set |urlPattern|'s [=URL pattern/search component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/search}}"], [=canonicalize a search=], and |compileOptions|. + 1. Set |urlPattern|'s [=URL pattern/hash component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hash}}"], [=canonicalize a hash=], and |compileOptions|. + 1. Return |urlPattern|. +
+ +
+ To perform a match given a [=URL pattern=] |urlPattern|, a {{URLPatternInput}} or [=/URL=] |input|, and an optional string |baseURLString|: + + 1. Let |protocol| be the empty string. + 1. Let |username| be the empty string. + 1. Let |password| be the empty string. + 1. Let |hostname| be the empty string. + 1. Let |port| be the empty string. + 1. Let |pathname| be the empty string. + 1. Let |search| be the empty string. + 1. Let |hash| be the empty string. + 1. Let |inputs| be an empty [=list=]. + 1. [=list/Append=] |input| to |inputs|. + 1. If |input| is a {{URLPatternInit}} then: + 1. If |baseURLString| was given, throw a {{TypeError}}. + 1. Let |applyResult| be the result of [=process a URLPatternInit=] given |input|, "url", |protocol|, |username|, |password|, |hostname|, |port|, |pathname|, |search|, and |hash|. If this throws an exception, catch it, and return null. + 1. Set |protocol| to |applyResult|["{{URLPatternInit/protocol}}"]. + 1. Set |username| to |applyResult|["{{URLPatternInit/username}}"]. + 1. Set |password| to |applyResult|["{{URLPatternInit/password}}"]. + 1. Set |hostname| to |applyResult|["{{URLPatternInit/hostname}}"]. + 1. Set |port| to |applyResult|["{{URLPatternInit/port}}"]. + 1. Set |pathname| to |applyResult|["{{URLPatternInit/pathname}}"]. + 1. Set |search| to |applyResult|["{{URLPatternInit/search}}"]. + 1. Set |hash| to |applyResult|["{{URLPatternInit/hash}}"]. + 1. Otherwise: + 1. Let |url| be |input|. + 1. If |input| is a {{USVString}}: + 1. Let |baseURL| be null. + 1. If |baseURLString| was given, then: + 1. Set |baseURL| to the result of [=URL parser|parsing=] |baseURLString|. + 1. If |baseURL| is failure, return null. + 1. [=list/Append=] |baseURLString| to |inputs|. + 1. Set |url| to the result of [=URL parser|parsing=] |input| given |baseURL|. + 1. If |url| is failure, return null. + 1. [=Assert=]: |url| is a [=/URL=]. + 1. Set |protocol| to |url|'s [=url/scheme=]. + 1. Set |username| to |url|'s [=url/username=]. + 1. Set |password| to |url|'s [=url/password=]. + 1. Set |hostname| to |url|'s [=url/host=] or the empty string if the value is null. + 1. Set |port| to |url|'s [=url/port=] or the empty string if the value is null. + 1. Set |pathname| to the result of [=URL path serializing=] |url|. + 1. Set |search| to |url|'s [=url/query=] or the empty string if the value is null. + 1. Set |hash| to |url|'s [=url/fragment=] or the empty string if the value is null. + 1. Let |protocolExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/protocol component=]'s [=component/regular expression=], |protocol|). + 1. Let |usernameExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/username component=]'s [=component/regular expression=], |username|). + 1. Let |passwordExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/password component=]'s [=component/regular expression=], |password|). + 1. Let |hostnameExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/hostname component=]'s [=component/regular expression=], |hostname|). + 1. Let |portExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/port component=]'s [=component/regular expression=], |port|). + 1. Let |pathnameExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/pathname component=]'s [=component/regular expression=], |pathname|). + 1. Let |searchExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/search component=]'s [=component/regular expression=], |search|). + 1. Let |hashExecResult| be [$RegExpBuiltinExec$](|urlPattern|'s [=URL pattern/hash component=]'s [=component/regular expression=], |hash|). + 1. If |protocolExecResult|, |usernameExecResult|, |passwordExecResult|, |hostnameExecResult|, |portExecResult|, |pathnameExecResult|, |searchExecResult|, or |hashExecResult| are null then return null. + 1. Let |result| be a new {{URLPatternResult}}. + 1. Set |result|["{{URLPatternResult/inputs}}"] to |inputs|. + 1. Set |result|["{{URLPatternResult/protocol}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/protocol component=], |protocol|, and |protocolExecResult|. + 1. Set |result|["{{URLPatternResult/username}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/username component=], |username|, and |usernameExecResult|. + 1. Set |result|["{{URLPatternResult/password}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/password component=], |password|, and |passwordExecResult|. + 1. Set |result|["{{URLPatternResult/hostname}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/hostname component=], |hostname|, and |hostnameExecResult|. + 1. Set |result|["{{URLPatternResult/port}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/port component=], |port|, and |portExecResult|. + 1. Set |result|["{{URLPatternResult/pathname}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/pathname component=], |pathname|, and |pathnameExecResult|. + 1. Set |result|["{{URLPatternResult/search}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/search component=], |search|, and |searchExecResult|. + 1. Set |result|["{{URLPatternResult/hash}}"] to the result of [=creating a component match result=] given |urlPattern|'s [=URL pattern/hash component=], |hash|, and |hashExecResult|. + 1. Return |result|. +
+ +
+ A [=URL pattern=] |urlPattern| has regexp groups if the following steps return true: + + 1. If |urlPattern|'s [=URL pattern/protocol component=] [=component/has regexp groups=] is true, then return true. + 1. If |urlPattern|'s [=URL pattern/username component=] [=component/has regexp groups=] is true, then return true. + 1. If |urlPattern|'s [=URL pattern/password component=] [=component/has regexp groups=] is true, then return true. + 1. If |urlPattern|'s [=URL pattern/hostname component=] [=component/has regexp groups=] is true, then return true. + 1. If |urlPattern|'s [=URL pattern/port component=] [=component/has regexp groups=] is true, then return true. + 1. If |urlPattern|'s [=URL pattern/pathname component=] [=component/has regexp groups=] is true, then return true. + 1. If |urlPattern|'s [=URL pattern/search component=] [=component/has regexp groups=] is true, then return true. + 1. If |urlPattern|'s [=URL pattern/hash component=] [=component/has regexp groups=] is true, then return true. + 1. Return false. +
+ +

Internals

+ +
+ To compile a component given a string |input|, [=/encoding callback=] |encoding callback|, and [=/options=] |options|: + + 1. Let |part list| be the result of running [=parse a pattern string=] given |input|, |options|, and |encoding callback|. + 1. Let (|regular expression string|, |name list|) be the result of running [=generate a regular expression and name list=] given |part list| and |options|. + 1. Let |flags| be an empty string. + 1. If |options|'s [=options/ignore case=] is true then set |flags| to "`vi`". + 1. Otherwise set |flags| to "`v`" + 1. Let |regular expression| be [$RegExpCreate$](|regular expression string|, |flags|). If this throws an exception, catch it, and throw a {{TypeError}}. +

The specification uses regular expressions to perform all matching, but this is not mandated. Implementations are free to perform matching directly against the [=/part list=] when possible; e.g. when there are no custom regexp matching groups. If there are custom regular expressions, however, its important that they be immediately evaluated in the [=compile a component=] algorithm so an error can be thrown if they are invalid. + 1. Let |pattern string| be the result of running [=generate a pattern string=] given |part list| and |options|. + 1. Let |has regexp groups| be false. + 1. [=list/For each=] |part| of |part list|: + 1. If |part|'s [=part/type=] is "`regexp`", then set |has regexp groups| to true. + 1. Return a new [=component=] whose [=component/pattern string=] is |pattern string|, [=component/regular expression=] is |regular expression|, [=component/group name list=] is |name list|, and [=component/has regexp groups=] is |has regexp groups|. +

+ +
+ To create a component match result given a [=component=] |component|, a string |input|, and an array representing the output of [$RegExpBuiltinExec$] |execResult|: + + 1. Let |result| be a new {{URLPatternComponentResult}}. + 1. Set |result|["{{URLPatternComponentResult/input}}"] to |input|. + 1. Let |groups| be a [=record=]<{{USVString}}, ({{USVString}} or {{undefined}})>. + 1. Let |index| be 1. + 1. While |index| is less than [$Get$](|execResult|, "`length`"): + 1. Let |name| be |component|'s [=component/group name list=][|index| − 1]. + 1. Let |value| be [$Get$](|execResult|, [$ToString$](|index|)). + 1. Set |groups|[|name|] to |value|. + 1. Increment |index| by 1. + 1. Set |result|["{{URLPatternComponentResult/groups}}"] to |groups|. + 1. Return |result|. +
+ +The default options is an [=options=] [=struct=] with [=options/delimiter code point=] set to the empty string and [=options/prefix code point=] set to the empty string. + +The hostname options is an [=options=] [=struct=] with [=options/delimiter code point=] set "`.`" and [=options/prefix code point=] set to the empty string. + +The pathname options is an [=options=] [=struct=] with [=options/delimiter code point=] set "`/`" and [=options/prefix code point=] set to "`/`". + +
+ To determine if a protocol component matches a special scheme given a [=component=] |protocol component|: + + 1. Let |special scheme list| be a [=list=] populated with all of the [=special schemes=]. + 1. [=list/For each=] |scheme| of |special scheme list|: + 1. Let |test result| be [$RegExpBuiltinExec$](|protocol component|'s [=component/regular expression=], |scheme|). + 1. If |test result| is not null, then return true. + 1. Return false. +
+ +
+ To determine if a hostname pattern is an IPv6 address given a [=/pattern string=] |input|: + + 1. If |input|'s [=string/code point length=] is less than 2, then return false. + 1. Let |input code points| be |input| interpreted as a [=list=] of [=/code points=]. + 1. If |input code points|[0] is U+005B (`[`), then return true. + 1. If |input code points|[0] is U+007B (`{`) and |input code points|[1] is U+005B (`[`), then return true. + 1. If |input code points|[0] is U+005C (\) and |input code points|[1] is U+005B (`[`), then return true. + 1. Return false. +
+ +

Constructor string parsing

+ +A constructor string parser is a [=struct=]. + +A [=constructor string parser=] has an associated input, a string, which must be set upon creation. + +A [=constructor string parser=] has an associated token list, a [=/token list=], which must be set upon creation. + +A [=constructor string parser=] has an associated result, a {{URLPatternInit}}, initially set to a new {{URLPatternInit}}. + +A [=constructor string parser=] has an associated component start, a number, initially set to 0. + +A [=constructor string parser=] has an associated token index, a number, initially set to 0. + +A [=constructor string parser=] has an associated token increment, a number, initially set to 1. + +A [=constructor string parser=] has an associated group depth, a number, initially set to 0. + +A [=constructor string parser=] has an associated hostname IPv6 bracket depth, a number, initially set to 0. + +A [=constructor string parser=] has an associated protocol matches a special scheme flag, a boolean, initially set to false. + +A [=constructor string parser=] has an associated state, a string, initially set to "`init`". It must be one of the following: + + + +
+

The URLPattern constructor string algorithm is very similar to the [=basic URL parser=] algorithm, but some differences prevent us from using that algorithm directly. +

First, the URLPattern constructor string parser operates on [=tokens=] generated using the "`lenient`" [=tokenize policy=]. In constrast, [=basic URL parser=] operates on code points. Operating on [=tokens=] allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like "`:hmm`" in "`https://a.c:hmm.example.com:8080`" without getting confused with the port number. +

Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like [=basic URL parser=] does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string. +

Finally, the URLPattern constructor string parser does not handle some parts of the [=basic URL parser=] state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser might not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the {{URLPatternInit}} constructor. +

+ +
+

In the constructor string algorithm, the pathname, search, and hash are wildcarded if earlier components are specified but later ones are not. For example, "`https://example.com/foo`" matches any search and any hash. Similarly, "`https://example.com`" matches any URL on that origin. This is analogous to the notion of a more specific component in the notes about [=process a URLPatternInit=] (e.g., a search is more specific than a pathname), but the constructor syntax only has a few cases where it is possible to specify a more specific component without also specifying the less specific components. +

The username and password components are always wildcard unless they are explicitly specified. +

If a hostname is specified and the port is not, the port is assumed to be the default port. If authors want to match any port, they have to write `:*` explicitly. For example, "`https://*`" is any HTTPS origin on port 443, and "`https://*:*`" is any HTTPS origin on any port. +

+ +
+To parse a constructor string given a string |input|: + + 1. Let |parser| be a new [=constructor string parser=] whose [=constructor string parser/input=] is |input| and [=constructor string parser/token list=] is the result of running [=tokenize=] given |input| and "`lenient`". + 1. [=While=] |parser|'s [=constructor string parser/token index=] is less than |parser|'s [=constructor string parser/token list=] [=list/size=]: + 1. Set |parser|'s [=constructor string parser/token increment=] to 1. +

On every iteration of the parse loop the |parser|'s [=constructor string parser/token index=] will be incremented by its [=constructor string parser/token increment=] value. Typically this means incrementing by 1, but at certain times it is set to zero. The [=constructor string parser/token increment=] is then always reset back to 1 at the top of the loop. + 1. If |parser|'s [=constructor string parser/token list=][|parser|'s [=constructor string parser/token index=]]'s [=token/type=] is "`end`" then: + 1. If |parser|'s [=constructor string parser/state=] is "`init`": +

If we reached the end of the string in the "`init`" [=constructor string parser/state=], then we failed to find a protocol terminator and this has to be a relative URLPattern constructor string. + 1. Run [=rewind=] given |parser|. +

We next determine at which component the relative pattern begins. Relative pathnames are most common, but URLs and URLPattern constructor strings can begin with the search or hash components as well. + 1. If the result of running [=is a hash prefix=] given |parser| is true, then run [=change state=] given |parser|, "`hash`" and 1. + 1. Otherwise if the result of running [=is a search prefix=] given |parser| is true: + 1. Run [=change state=] given |parser|, "`search`" and 1. + 1. Otherwise: + 1. Run [=change state=] given |parser|, "`pathname`" and 0. + 1. Increment |parser|'s [=constructor string parser/token index=] by |parser|'s [=constructor string parser/token increment=]. + 1. [=Continue=]. + 1. If |parser|'s [=constructor string parser/state=] is "`authority`": +

If we reached the end of the string in the "`authority`" [=constructor string parser/state=], then we failed to find an "`@`". Therefore there is no username or password. + 1. Run [=rewind and set state=] given |parser|, and "`hostname`". + 1. Increment |parser|'s [=constructor string parser/token index=] by |parser|'s [=constructor string parser/token increment=]. + 1. [=Continue=]. + 1. Run [=change state=] given |parser|, "`done`" and 0. + 1. [=Break=]. + 1. If the result of running [=is a group open=] given |parser| is true: +

+

We ignore all code points within "`{ ... }`" pattern groupings. It would not make sense to allow a URL component boundary to lie within a grouping; e.g. "`https://example.c{om/fo}o`". While not supported within [=well formed=] [=/pattern strings=], we handle nested groupings here to avoid parser confusion. +

It is not necessary to perform this logic for regexp or named groups since those values are collapsed into individual [=tokens=] by the [=tokenize=] algorithm. +

+ 1. Increment |parser|'s [=constructor string parser/group depth=] by 1. + 1. Increment |parser|'s [=constructor string parser/token index=] by |parser|'s [=constructor string parser/token increment=]. + 1. [=Continue=]. + 1. If |parser|'s [=constructor string parser/group depth=] is greater than 0: + 1. If the result of running [=is a group close=] given |parser| is true, then decrement |parser|'s [=constructor string parser/group depth=] by 1. + 1. Otherwise: + 1. Increment |parser|'s [=constructor string parser/token index=] by |parser|'s [=constructor string parser/token increment=]. + 1. [=Continue=]. + 1. Switch on |parser|'s [=constructor string parser/state=] and run the associated steps: +
+
"`init`"
+
+ 1. If the result of running [=is a protocol suffix=] given |parser| is true: + 1. Run [=rewind and set state=] given |parser| and "`protocol`". +
+
"`protocol`"
+
+ 1. If the result of running [=is a protocol suffix=] given |parser| is true: + 1. Run [=compute protocol matches a special scheme flag=] given |parser|. +

We need to eagerly compile the protocol component to determine if it matches any [=special schemes=]. If it does then certain special rules apply. It determines if the pathname defaults to a "`/`" and also whether we will look for the username, password, hostname, and port components. Authority slashes can also cause us to look for these components as well. Otherwise we treat this as an "opaque path URL" and go straight to the pathname component. + 1. Let |next state| be "`pathname`". + 1. Let |skip| be 1. + 1. If the result of running [=next is authority slashes=] given |parser| is true: + 1. Set |next state| to "`authority`". + 1. Set |skip| to 3. + 1. Otherwise if |parser|'s [=constructor string parser/protocol matches a special scheme flag=] is true, then set |next state| to "`authority`". + 1. Run [=change state=] given |parser|, |next state|, and |skip|. +

+
"`authority`"
+
+ 1. If the result of running [=is an identity terminator=] given |parser| is true, then run [=rewind and set state=] given |parser| and "`username`". + 1. Otherwise if any of the following are true: +
    +
  • the result of running [=is a pathname start=] given |parser|;
  • +
  • the result of running [=is a search prefix=] given |parser|; or
  • +
  • the result of running [=is a hash prefix=] given |parser|,
  • +
+

then run [=rewind and set state=] given |parser| and "`hostname`". +

+
"`username`"
+
+ 1. If the result of running [=is a password prefix=] given |parser| is true, then run [=change state=] given |parser|, "`password`", and 1. + 1. Otherwise if the result of running [=is an identity terminator=] given |parser| is true, then run [=change state=] given |parser|, "`hostname`", and 1. +
+
"`password`"
+
+ 1. If the result of running [=is an identity terminator=] given |parser| is true, then run [=change state=] given |parser|, "`hostname`", and 1. +
+
"`hostname`"
+
+ 1. If the result of running [=is an IPv6 open=] given |parser| is true, then increment |parser|'s [=constructor string parser/hostname IPv6 bracket depth=] by 1. + 1. Otherwise if the result of running [=is an IPv6 close=] given |parser| is true, then decrement |parser|'s [=constructor string parser/hostname IPv6 bracket depth=] by 1. + 1. Otherwise if the result of running [=is a port prefix=] given |parser| is true and |parser|'s [=constructor string parser/hostname IPv6 bracket depth=] is zero, then run [=change state=] given |parser|, "`port`", and 1. + 1. Otherwise if the result of running [=is a pathname start=] given |parser| is true, then run [=change state=] given |parser|, "`pathname`", and 0. + 1. Otherwise if the result of running [=is a search prefix=] given |parser| is true, then run [=change state=] given |parser|, "`search`", and 1. + 1. Otherwise if the result of running [=is a hash prefix=] given |parser| is true, then run [=change state=] given |parser|, "`hash`", and 1. +
+
"`port`"
+
+ 1. If the result of running [=is a pathname start=] given |parser| is true, then run [=change state=] given |parser|, "`pathname`", and 0. + 1. Otherwise if the result of running [=is a search prefix=] given |parser| is true, then run [=change state=] given |parser|, "`search`", and 1. + 1. Otherwise if the result of running [=is a hash prefix=] given |parser| is true, then run [=change state=] given |parser|, "`hash`", and 1. +
+
"`pathname`"
+
+ 1. If the result of running [=is a search prefix=] given |parser| is true, then run [=change state=] given |parser|, "`search`", and 1. + 1. Otherwise if the result of running [=is a hash prefix=] given |parser| is true, then run [=change state=] given |parser|, "`hash`", and 1. +
+
"`search`"
+
+ 1. If the result of running [=is a hash prefix=] given |parser| is true, then run [=change state=] given |parser|, "`hash`", and 1. +
+
"`hash`"
+
+ 1. Do nothing. +
+
"`done`"
+
+ 1. [=Assert=]: This step is never reached. +
+
+ 1. Increment |parser|'s [=constructor string parser/token index=] by |parser|'s [=constructor string parser/token increment=]. + 1. If |parser|'s [=constructor string parser/result=] [=map/contains=] "{{URLPatternInit/hostname}}" and not "{{URLPatternInit/port}}", then set |parser|'s [=constructor string parser/result=]["{{URLPatternInit/port}}"] to the empty string. + +
This is special-cased because when an author does not specify a port, they usually intend the default port. If any port is acceptable, the author can specify it as a wildcard explicitly. For example, "`https://example.com/*`" does not match URLs beginning with "`https://example.com:8443/`", which is a different origin.
+ 1. Return |parser|'s [=constructor string parser/result=]. +
+ +
+To change state given a [=constructor string parser=] |parser|, a [=constructor string parser/state=] |new state|, and a number |skip|: + + 1. If |parser|'s [=constructor string parser/state=] is not "`init`", not "`authority`", and not "`done`", then set |parser|'s [=constructor string parser/result=][|parser|'s [=constructor string parser/state=]] to the result of running [=make a component string=] given |parser|. + 1. If |parser|'s [=constructor string parser/state=] is not "`init`" and |new state| is not "`done`", then: + 1. If |parser|'s [=constructor string parser/state=] is "`protocol`", "`authority`", "`username`", or "`password`"; |new state| is "`port`", "`pathname`", "`search`", or "`hash`"; and |parser|'s [=constructor string parser/result=]["{{URLPatternInit/hostname}}"] does not [=map/exist=], then set |parser|'s [=constructor string parser/result=]["{{URLPatternInit/hostname}}"] to the empty string. + 1. If |parser|'s [=constructor string parser/state=] is "`protocol`", "`authority`", "`username`", "`password`", "`hostname`", or "`port`"; |new state| is "`search`" or "`hash`"; and |parser|'s [=constructor string parser/result=]["{{URLPatternInit/pathname}}"] does not [=map/exist=], then: + 1. If |parser|'s [=constructor string parser/protocol matches a special scheme flag=] is true, then set |parser|'s [=constructor string parser/result=]["{{URLPatternInit/pathname}}"] to "`/`". + 1. Otherwise, set |parser|'s [=constructor string parser/result=]["{{URLPatternInit/pathname}}"] to the empty string. + 1. If |parser|'s [=constructor string parser/state=] is "`protocol`", "`authority`", "`username`", "`password`", "`hostname`", "`port`", or "`pathname`"; |new state| is "`hash`"; and |parser|'s [=constructor string parser/result=]["{{URLPatternInit/search}}"] does not [=map/exist=], then set |parser|'s [=constructor string parser/result=]["{{URLPatternInit/search}}"] to the empty string. + 1. Set |parser|'s [=constructor string parser/state=] to |new state|. + 1. Increment |parser|'s [=constructor string parser/token index=] by |skip|. + 1. Set |parser|'s [=constructor string parser/component start=] to |parser|'s [=constructor string parser/token index=]. + 1. Set |parser|'s [=constructor string parser/token increment=] to 0. +
+ +
+To rewind given a [=constructor string parser=] |parser|: + + 1. Set |parser|'s [=constructor string parser/token index=] to |parser|'s [=constructor string parser/component start=]. + 1. Set |parser|'s [=constructor string parser/token increment=] to 0. +
+ +
+To rewind and set state given a [=constructor string parser=] |parser| and a [=constructor string parser/state=] |state|: + + 1. Run [=rewind=] given |parser|. + 1. Set |parser|'s [=constructor string parser/state=] to |state|. +
+ +
+To get a safe token given a [=constructor string parser=] |parser| and a number |index|: + + 1. If |index| is less than |parser|'s [=constructor string parser/token list=]'s [=list/size=], then return |parser|'s [=constructor string parser/token list=][|index|]. + 1. [=Assert=]: |parser|'s [=constructor string parser/token list=]'s [=list/size=] is greater than or equal to 1. + 1. Let |last index| be |parser|'s [=constructor string parser/token list=]'s [=list/size=] − 1. + 1. Let |token| be |parser|'s [=constructor string parser/token list=][|last index|]. + 1. [=Assert=]: |token|'s [=token/type=] is "`end`". + 1. Return |token|. +
+ +
+To run is a non-special pattern char given a [=constructor string parser=] |parser|, a number |index|, and a string |value|: + + 1. Let |token| be the result of running [=get a safe token=] given |parser| and |index|. + 1. If |token|'s [=token/value=] is not |value|, then return false. + 1. If any of the following are true: + +

then return true. + 1. Return false. +

+ +
+To run is a protocol suffix given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=], and "`:`". +
+ +
+To run next is authority slashes given a [=constructor string parser=] |parser|: + + 1. If the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=] + 1, and "`/`" is false, then return false. + 1. If the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=] + 2, and "`/`" is false, then return false. + 1. Return true. +
+ +
+To run is an identity terminator given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=], and "`@`". +
+ +
+To run is a password prefix given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=], and "`:`". +
+ +
+To run is a port prefix given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=], and "`:`". +
+ +
+To run is a pathname start given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=], and "`/`". +
+ +
+To run is a search prefix given a [=constructor string parser=] |parser|: + + 1. If result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=] and "`?`" is true, then return true. + 1. If |parser|'s [=constructor string parser/token list=][|parser|'s [=constructor string parser/token index=]]'s [=token/value=] is not "`?`", then return false. + 1. Let |previous index| be |parser|'s [=constructor string parser/token index=] − 1. + 1. If |previous index| is less than 0, then return true. + 1. Let |previous token| be the result of running [=get a safe token=] given |parser| and |previous index|. + 1. If any of the following are true, then return false: + + 1. Return true. +
+ +
+To run is a hash prefix given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=] and "`#`". +
+ +
+To run is a group open given a [=constructor string parser=] |parser|: + 1. If |parser|'s [=constructor string parser/token list=][|parser|'s [=constructor string parser/token index=]]'s [=token/type=] is "`open`", then return true. + 1. Otherwise return false. +
+ +
+To run is a group close given a [=constructor string parser=] |parser|: + 1. If |parser|'s [=constructor string parser/token list=][|parser|'s [=constructor string parser/token index=]]'s [=token/type=] is "`close`", then return true. + 1. Otherwise return false. +
+ +
+To run is an IPv6 open given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=], and "`[`". +
+ +
+To run is an IPv6 close given a [=constructor string parser=] |parser|: + + 1. Return the result of running [=is a non-special pattern char=] given |parser|, |parser|'s [=constructor string parser/token index=], and "`]`". +
+ +
+To run make a component string given a [=constructor string parser=] |parser|: + + 1. [=Assert=]: |parser|'s [=constructor string parser/token index=] is less than |parser|'s [=constructor string parser/token list=]'s [=list/size=]. + 1. Let |token| be |parser|'s [=constructor string parser/token list=][|parser|'s [=constructor string parser/token index=]]. + 1. Let |component start token| be the result of running [=get a safe token=] given |parser| and |parser|'s [=constructor string parser/component start=]. + 1. Let |component start input index| be |component start token|'s [=token/index=]. + 1. Let |end index| be |token|'s [=token/index=]. + 1. Return the [=code point substring by positions|code point substring=] from |component start input index| to |end index| within |parser|'s [=constructor string parser/input=]. +
+ +
+To compute protocol matches a special scheme flag given a [=constructor string parser=] |parser|: + + 1. Let |protocol string| be the result of running [=make a component string=] given |parser|. + 1. Let |protocol component| be the result of [=compiling a component=] given |protocol string|, [=canonicalize a protocol=], and [=default options=]. + 1. If the result of running [=protocol component matches a special scheme=] given |protocol component| is true, then set |parser|'s [=constructor string parser/protocol matches a special scheme flag=] to true. +
+ +

Pattern strings

+ +A pattern string is a string that is written to match a set of target strings. A well formed pattern string conforms to a particular pattern syntax. This pattern syntax is directly based on the syntax used by the popular [path-to-regexp](https://github.com/pillarjs/path-to-regexp) JavaScript library. + +It can be [=parse a pattern string|parsed=] to produce a [=/part list=] which describes, in order, what must appear in a component string for the pattern string to match. + +
+ Pattern strings can contain capture groups, which by default match the shortest possible string, up to a component-specific separator (`/` in the pathname, `.` in the hostname). For example, the pathname pattern "`/blog/:title`" will match "`/blog/hello-world`" but not "`/blog/2012/02`". + + A regular expression can also be used instead, so the pathname pattern "`/blog/:year(\\d+)/:month(\\d+)`" will match "`/blog/2012/02`". + + A group can also be made optional, or repeated, by using a modifier. For example, the pathname pattern "`/products/:id?"` will match both "`/products`" and "`/products/2`" (but not "`/products/`"). In the pathname specifically, groups automatically require a leading `/`; to avoid this, the group can be explicitly deliminated, as in the pathname pattern "`/products/{:id}?`". + + A full wildcard `*` can also be used to match as much as possible, as in the pathname pattern "`/products/*`". +
+ +

Parsing pattern strings

+ +

Tokens

+ +A token list is a [=list=] containing zero or more [=token=] [=structs=]. + +A token is a [=struct=] representing a single lexical token within a [=/pattern string=]. + +A [=token=] has an associated type, a string, initially "`invalid-char`". It must be one of the following: + +
+
"`open`"
+
The [=token=] represents a U+007B (`{`) code point. +
"`close`"
+
The [=token=] represents a U+007D (`}`) code point. +
"`regexp`"
+
The [=token=] represents a string of the form "`()`". The regular expression is required to consist of only ASCII code points. +
"`name`"
+
The [=token=] represents a string of the form "`:`". The name value is restricted to code points that are consistent with JavaScript identifiers. +
"`char`"
+
The [=token=] represents a valid pattern code point without any special syntactical meaning. +
"`escaped-char`"
+
The [=token=] represents a code point escaped using a backslash like "`\`". +
"`other-modifier`"
+
The [=token=] represents a matching group modifier that is either the U+003F (`?`) or U+002B (`+`) code points. +
"`asterisk`"
+
The [=token=] represents a U+002A (`*`) code point that can be either a wildcard matching group or a matching group modifier. +
"`end`"
+
The [=token=] represents the end of the [=/pattern string=]. +
"`invalid-char`"
+
The [=token=] represents a code point that is invalid in the pattern. This could be because of the code point value itself or due to its location within the pattern relative to other syntactic elements. +
+ +A [=token=] has an associated index, a number, initially 0. It is the position of the first code point in the [=/pattern string=] represented by the [=token=]. + +A [=token=] has an associated value, a string, initially the empty string. It contains the code points from the [=/pattern string=] represented by the [=token=]. + +

Tokenizing

+ +A tokenize policy is a string that must be either "`strict`" or "`lenient`". + +A tokenizer is a [=struct=]. + +A [=tokenizer=] has an associated input, a [=/pattern string=], initially the empty string. + +A [=tokenizer=] has an associated policy, a [=tokenize policy=], initially "`strict`". + +A [=tokenizer=] has an associated token list, a [=/token list=], initially an empty [=list=]. + +A [=tokenizer=] has an associated index, a number, initially 0. + +A [=tokenizer=] has an associated next index, a number, initially 0. + +A [=tokenizer=] has an associated code point, a Unicode code point, initially null. + +
+ To tokenize a given string |input| and [=tokenize policy=] |policy|: + + 1. Let |tokenizer| be a new [=tokenizer=]. + 1. Set |tokenizer|'s [=tokenizer/input=] to |input|. + 1. Set |tokenizer|'s [=tokenizer/policy=] to |policy|. + 1. While |tokenizer|'s [=tokenizer/index=] is less than |tokenizer|'s [=tokenizer/input=]'s [=string/code point length=]: + 1. Run [=seek and get the next code point=] given |tokenizer| and |tokenizer|'s [=tokenizer/index=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+002A (`*`): + 1. Run [=add a token with default position and length=] given |tokenizer| and "`asterisk`". + 1. [=Continue=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+002B (`+`) or U+003F (`?`): + 1. Run [=add a token with default position and length=] given |tokenizer| and "`other-modifier`". + 1. [=Continue=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+005C (\): + 1. If |tokenizer|'s [=tokenizer/index=] is equal to |tokenizer|'s [=tokenizer/input=]'s [=string/code point length=] − 1: + 1. Run [=process a tokenizing error=] given |tokenizer|, |tokenizer|'s [=tokenizer/next index=], and |tokenizer|'s [=tokenizer/index=]. + 1. [=Continue=]. + 1. Let |escaped index| be |tokenizer|'s [=tokenizer/next index=]. + 1. Run [=get the next code point=] given |tokenizer|. + 1. Run [=add a token with default length=] given |tokenizer|, "`escaped-char`", |tokenizer|'s [=tokenizer/next index=], and |escaped index|. + 1. [=Continue=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+007B (`{`): + 1. Run [=add a token with default position and length=] given |tokenizer| and "`open`". + 1. [=Continue=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+007D (`}`): + 1. Run [=add a token with default position and length=] given |tokenizer| and "`close`". + 1. [=Continue=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+003A (`:`): + 1. Let |name position| be |tokenizer|'s [=tokenizer/next index=]. + 1. Let |name start| be |name position|. + 1. While |name position| is less than |tokenizer|'s [=tokenizer/input=]'s [=string/code point length=]: + 1. Run [=seek and get the next code point=] given |tokenizer| and |name position|. + 1. Let |first code point| be true if |name position| equals |name start| and false otherwise. + 1. Let |valid code point| be the result of running [=is a valid name code point=] given |tokenizer|'s [=tokenizer/code point=] and |first code point|. + 1. If |valid code point| is false [=break=]. + 1. Set |name position| to |tokenizer|'s [=tokenizer/next index=]. + 1. If |name position| is less than or equal to |name start|: + 1. Run [=process a tokenizing error=] given |tokenizer|, |name start|, and |tokenizer|'s [=tokenizer/index=]. + 1. [=Continue=]. + 1. Run [=add a token with default length=] given |tokenizer|, "`name`", |name position|, and |name start|. + 1. [=Continue=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+0028 (`(`): + 1. Let |depth| be 1. + 1. Let |regexp position| be |tokenizer|'s [=tokenizer/next index=]. + 1. Let |regexp start| be |regexp position|. + 1. Let |error| be false. + 1. While |regexp position| is less than |tokenizer|'s [=tokenizer/input=]'s [=string/code point length=]: + 1. Run [=seek and get the next code point=] given |tokenizer| and |regexp position|. + 1. If the result of running [=is ASCII=] given |tokenizer|'s [=tokenizer/code point=] is false: + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. Set |error| to true. + 1. [=Break=]. + 1. If |regexp position| equals |regexp start| and |tokenizer|'s [=tokenizer/code point=] is U+003F (`?`): + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. Set |error| to true. + 1. [=Break=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+005C (\): + 1. If |regexp position| equals |tokenizer|'s [=tokenizer/input=]'s [=string/code point length=] − 1: + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. Set |error| to true. + 1. [=Break=] + 1. Run [=get the next code point=] given |tokenizer|. + 1. If the result of running [=is ASCII=] given |tokenizer|'s [=tokenizer/code point=] is false: + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. Set |error| to true. + 1. [=Break=]. + 1. Set |regexp position| to |tokenizer|'s [=tokenizer/next index=]. + 1. [=Continue=]. + 1. If |tokenizer|'s [=tokenizer/code point=] is U+0029 (`)`): + 1. Decrement |depth| by 1. + 1. If |depth| is 0: + 1. Set |regexp position| to |tokenizer|'s [=tokenizer/next index=]. + 1. [=Break=]. + 1. Otherwise if |tokenizer|'s [=tokenizer/code point=] is U+0028 (`(`): + 1. Increment |depth| by 1. + 1. If |regexp position| equals |tokenizer|'s [=tokenizer/input=]'s [=string/code point length=] − 1: + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. Set |error| to true. + 1. [=Break=] + 1. Let |temporary position| be |tokenizer|'s [=tokenizer/next index=]. + 1. Run [=get the next code point=] given |tokenizer|. + 1. If |tokenizer|'s [=tokenizer/code point=] is not U+003F (`?`): + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. Set |error| to true. + 1. [=Break=]. + 1. Set |tokenizer|'s [=tokenizer/next index=] to |temporary position|. + 1. Set |regexp position| to |tokenizer|'s [=tokenizer/next index=]. + 1. If |error| is true [=continue=]. + 1. If |depth| is not zero: + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. [=Continue=]. + 1. Let |regexp length| be |regexp position| − |regexp start| − 1. + 1. If |regexp length| is zero: + 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. + 1. [=Continue=]. + 1. Run [=add a token=] given |tokenizer|, "`regexp`", |regexp position|, |regexp start|, and |regexp length|. + 1. [=Continue=]. + 1. Run [=add a token with default position and length=] given |tokenizer| and "`char`". + 1. Run [=add a token with default length=] given |tokenizer|, "`end`", |tokenizer|'s [=tokenizer/index=], and |tokenizer|'s [=tokenizer/index=]. + 1. Return |tokenizer|'s [=tokenizer/token list=]. +
+ +
+ To get the next code point for a given [=tokenizer=] |tokenizer|: + + 1. Set |tokenizer|'s [=tokenizer/code point=] to the Unicode code point in |tokenizer|'s [=tokenizer/input=] at the position indicated by |tokenizer|'s [=tokenizer/next index=]. + 1. Increment |tokenizer|'s [=tokenizer/next index=] by 1. +
+ +
+ To seek and get the next code point for a given [=tokenizer=] |tokenizer| and number |index|: + + 1. Set |tokenizer|'s [=tokenizer/next index=] to |index|. + 1. Run [=get the next code point=] given |tokenizer|. +
+ +
+ To add a token for a given [=tokenizer=] |tokenizer|, [=token/type=] |type|, number |next position|, number |value position|, and number |value length|: + + 1. Let |token| be a new [=token=]. + 1. Set |token|'s [=token/type=] to |type|. + 1. Set |token|'s [=token/index=] to |tokenizer|'s [=tokenizer/index=]. + 1. Set |token|'s [=token/value=] to the [=code point substring=] from |value position| with length |value length| within |tokenizer|'s [=tokenizer/input=]. + 1. [=list/Append=] |token| to the back of |tokenizer|'s [=tokenizer/token list=]. + 1. Set |tokenizer|'s [=token/index=] to |next position|. +
+ +
+ To add a token with default length for a given [=tokenizer=] |tokenizer|, [=token/type=] |type|, number |next position|, and number |value position|: + + 1. Let |computed length| be |next position| − |value position|. + 1. Run [=add a token=] given |tokenizer|, |type|, |next position|, |value position|, and |computed length|. +
+ +
+ To add a token with default position and length for a given [=tokenizer=] |tokenizer| and [=token/type=] |type|: + + 1. Run [=add a token with default length=] given |tokenizer|, |type|, |tokenizer|'s [=tokenizer/next index=], and |tokenizer|'s [=tokenizer/index=]. +
+ +
+ To process a tokenizing error for a given [=tokenizer=] |tokenizer|, a number |next position|, and a number |value position|: + + 1. If |tokenizer|'s [=tokenizer/policy=] is "`strict`", then throw a {{TypeError}}. + 1. [=Assert=]: |tokenizer|'s [=tokenizer/policy=] is "`lenient`". + 1. Run [=add a token with default length=] given |tokenizer|, "`invalid-char`", |next position|, and |value position|. +
+ +
+ To perform is a valid name code point given a Unicode |code point| and a boolean |first|: + + 1. If |first| is true return the result of checking if |code point| is contained in the [=IdentifierStart=] set of code points. + 1. Otherwise return the result of checking if |code point| is contained in the [=IdentifierPart=] set of code points. +
+ +
+ To determine if a Unicode |code point| is ASCII: + + 1. If |code point| is between U+0000 and U+007F inclusive, then return true. + 1. Otherwise return false. +
+ +

Parts

+ +A part list is a [=list=] of zero or more [=parts=]. + +A part is a [=struct=] representing one piece of a parser [=/pattern string=]. It can contain at most one matching group, a fixed text prefix, a fixed text suffix, and a modifier. It can contain as little as a single fixed text string or a single matching group. + +A [=part=] has an associated type, a string, which must be set upon creation. It must be one of the following: + +
+
"`fixed-text`"
+
The [=part=] represents a simple fixed text string.
+
"`regexp`"
+
The [=part=] represents a matching group with a custom regular expression.
+
"`segment-wildcard`"
+
The [=part=] represents a matching group that matches code points up to the next separator code point. This is typically used for a named group like "`:foo`" that does not have a custom regular expression.
+
"`full-wildcard`"
+
The [=part=] represents a matching group that greedily matches all code points. This is typically used for the "`*`" wildcard matching group.
+
+ +A [=part=] has an associated value, a string, which must be set upon creation. + +A [=part=] has an associated modifier a string, which must be set upon creation. It must be one of the following: + +
+
"`none`"
+
The [=part=] does not have a [=part/modifier=].
+
"`optional`"
+
The [=part=] has an optional [=part/modifier=] indicated by the U+003F (`?`) code point.
+
"`zero-or-more`"
+
The [=part=] has a "zero or more" [=part/modifier=] indicated by the U+002A (`*`) code point.
+
"`one-or-more`"
+
The [=part=] has a "one or more" [=part/modifier=] indicated by the U+002B (`+`) code point.
+
+ +A [=part=] has an associated name, a string, initially the empty string. + +A [=part=] has an associated prefix, a string, initially the empty string. + +A [=part=] has an associated suffix, a string, initially the empty string. + +

Options

+ +An options [=struct=] contains different settings that control how [=/pattern string=] behaves. These options originally come from [path-to-regexp](https://github.com/pillarjs/path-to-regexp). We only include the options that are modified within the URLPattern specification and exclude the other options. For the purposes of comparison, this specification acts like [path-to-regexp](https://github.com/pillarjs/path-to-regexp) where `strict`, `start`, and `end` are always set to false. + +An [=/options=] has an associated delimiter code point, a string, which must be set upon creation. It must contain one [=ASCII code point=] or the empty string. This code point is treated as a segment separator and is used for determining how far a `:foo` named group should match by default. For example, if the [=options/delimiter code point=] is "`/`" then "`/:foo`" will match "`/bar`", but not "`/bar/baz`". If the [=options/delimiter code point=] is the empty string then the example pattern would match both strings. + +An [=/options=] has an associated prefix code point, a string, which must be set upon creation. It must contain one [=ASCII code point=] or the empty string. The code point is treated as an automatic prefix if found immediately preceding a match group. This matters when a match group is modified to be optional or repeating. For example, if [=options/prefix code point=] is "`/`" then "`/foo/:bar?/baz`" will treat the "`/`" before "`:bar`" as a prefix that becomes optional along with the named group. So in this example the pattern would match "`/foo/baz`". + +An [=/options=] has an associated ignore case, a boolean, which must be set up upon creation. It defaults to false. Depending on the set value, true or false, this flag enables case-sensitive or case-insensitive matches, respectively. For the purpose of comparison, this case be thought of as the negated `sensitive` option in [path-to-regexp](https://github.com/pillarjs/path-to-regexp). + +

Parsing

+ +
+An encoding callback is an abstract algorithm that takes a given string |input|. The |input| will be a simple text piece of a [=/pattern string=]. An implementing algorithm will validate and encode the |input|. It must return the encoded string or throw an exception. +
+ +A pattern parser is a [=struct=]. + +A [=pattern parser=] has an associated token list, a [=/token list=], initially an empty [=list=]. + +A [=pattern parser=] has an associated encoding callback, a [=/encoding callback=], that must be set upon creation. + +A [=pattern parser=] has an associated segment wildcard regexp, a string, that must be set upon creation. + +A [=pattern parser=] has an associated part list, a [=/part list=], initially an empty [=list=]. + +A [=pattern parser=] has an associated pending fixed value, a string, initially the empty string. + +A [=pattern parser=] has an associated index, a number, initially 0. + +A [=pattern parser=] has an associated next numeric name, a number, initially 0. + +
+To parse a pattern string given a [=/pattern string=] |input|, [=/options=] |options|, and [=/encoding callback=] |encoding callback|: + + 1. Let |parser| be a new [=pattern parser=] whose [=pattern parser/encoding callback=] is |encoding callback| and [=pattern parser/segment wildcard regexp=] is the result of running [=generate a segment wildcard regexp=] given |options|. + 1. Set |parser|'s [=pattern parser/token list=] to the result of running [=tokenize=] given |input| and "`strict`". + 1. While |parser|'s [=pattern parser/index=] is less than |parser|'s [=pattern parser/token list=]'s [=list/size=]: +
+

This first section is looking for the sequence: ``. There could be zero to all of these tokens. +

+
"`/:foo(bar)?`"
+
All four [=tokens=].
+
"`/`"
+
One "`char`" [=token=]. +
"`:foo`"
+
One "`name`" [=token=]. +
"`(bar)`"
+
One "`regexp`" [=token=]. +
"`/:foo`"
+
"`char`" and "`name`" [=tokens=]. +
"`/(bar)`"
+
"`char`" and "`regexp`" [=tokens=]. +
"`/:foo?`"
+
"`char`", "`name`", and "`other-modifier`" [=tokens=]. +
"`/(bar)?`"
+
"`char`", "`regexp`", and "`other-modifier`" [=tokens=]. +
+
+ 1. Let |char token| be the result of running [=try to consume a token=] given |parser| and "`char`". + 1. Let |name token| be the result of running [=try to consume a token=] given |parser| and "`name`". + 1. Let |regexp or wildcard token| be the result of running [=try to consume a regexp or wildcard token=] given |parser| and |name token|. + 1. If |name token| is not null or |regexp or wildcard token| is not null: +

If there is a matching group, we need to add the [=part=] immediately. + 1. Let |prefix| be the empty string. + 1. If |char token| is not null then set |prefix| to |char token|'s [=token/value=]. + 1. If |prefix| is not the empty string and not |options|'s [=options/prefix code point=]: + 1. Append |prefix| to the end of |parser|'s [=pattern parser/pending fixed value=]. + 1. Set |prefix| to the empty string. + 1. Run [=maybe add a part from the pending fixed value=] given |parser|. + 1. Let |modifier token| be the result of running [=try to consume a modifier token=] given |parser|. + 1. Run [=add a part=] given |parser|, |prefix|, |name token|, |regexp or wildcard token|, the empty string, and |modifier token|. + 1. [=Continue=]. + 1. Let |fixed token| be |char token|. +

If there was no matching group, then we need to buffer any fixed text. We want to collect as much text as possible before adding it as a "`fixed-text`" [=part=]. + 1. If |fixed token| is null, then set |fixed token| to the result of running [=try to consume a token=] given |parser| and "`escaped-char`". + 1. If |fixed token| is not null: + 1. Append |fixed token|'s [=token/value=] to |parser|'s [=pattern parser/pending fixed value=]. + 1. [=Continue=]. + 1. Let |open token| be the result of running [=try to consume a token=] given |parser| and "`open`". +

+

Next we look for the sequence ``. The open and close are necessary, but the other tokens are not. +

+
"`{a:foo(bar)b}?`"
+
All [=tokens=] are present. +
"`{:foo}?`"
+
"`open`", "`name`", "`close`", and "`other-modifier`" [=tokens=].
+
"`{(bar)}?`"
+
"`open`", "`regexp`", "`close`", and "`other-modifier`" [=tokens=].
+
"`{ab}?`"
+
"`open`", "`char`", "`close`", and "`other-modifier`" [=tokens=].
+
+
+ 1. If |open token| is not null: + 1. Set |prefix| be the result of running [=consume text=] given |parser|. + 1. Set |name token| to the result of running [=try to consume a token=] given |parser| and "`name`". + 1. Set |regexp or wildcard token| to the result of running [=try to consume a regexp or wildcard token=] given |parser| and |name token|. + 1. Let |suffix| be the result of running [=consume text=] given |parser|. + 1. Run [=consume a required token=] given |parser| and "`close`". + 1. Set |modifier token| to the result of running [=try to consume a modifier token=] given |parser|. + 1. Run [=add a part=] given |parser|, |prefix|, |name token|, |regexp or wildcard token|, |suffix|, and |modifier token|. + 1. [=Continue=]. + 1. Run [=maybe add a part from the pending fixed value=] given |parser|. + 1. Run [=consume a required token=] given |parser| and "`end`". + 1. Return |parser|'s [=pattern parser/part list=]. +
+ +The full wildcard regexp value is the string "`.*`". + +
+To generate a segment wildcard regexp given an [=/options=] |options|: + + 1. Let |result| be "`[^`". + 1. Append the result of running [=escape a regexp string=] given |options|'s [=options/delimiter code point=] to the end of |result|. + 1. Append "`]+?`" to the end of |result|. + 1. Return |result|. +
+ +
+To try to consume a token given a [=pattern parser=] |parser| and [=token/type=] |type|: + + 1. [=Assert=]: |parser|'s [=pattern parser/index=] is less than |parser|'s [=pattern parser/token list=] [=list/size=]. + 1. Let |next token| be |parser|'s [=pattern parser/token list=][|parser|'s [=pattern parser/index=]]. + 1. If |next token|'s [=token/type=] is not |type| return null. + 1. Increment |parser|'s [=pattern parser/index=] by 1. + 1. Return |next token|. +
+ +
+To try to consume a modifier token given a [=pattern parser=] |parser|: + + 1. Let |token| be the result of running [=try to consume a token=] given |parser| and "`other-modifier`". + 1. If |token| is not null, then return |token|. + 1. Set |token| to the result of running [=try to consume a token=] given |parser| and "`asterisk`". + 1. Return |token|. +
+ +
+To try to consume a regexp or wildcard token given a [=pattern parser=] |parser| and [=token=] |name token|: + + 1. Let |token| be the result of running [=try to consume a token=] given |parser| and "`regexp`". + 1. If |name token| is null and |token| is null, then set |token| to the result of running [=try to consume a token=] given |parser| and "`asterisk`". + 1. Return |token|. +
+ +
+To consume a required token given a [=pattern parser=] |parser| and [=token/type=] |type|: + + 1. Let |result| be the result of running [=try to consume a token=] given |parser| and |type|. + 1. If |result| is null, then throw a {{TypeError}}. + 1. Return |result|. +
+ +
+To consume text given a [=pattern parser=] |parser|: + + 1. Let |result| be the empty string. + 1. While true: + 1. Let |token| be the result of running [=try to consume a token=] given |parser| and "`char`". + 1. If |token| is null, then set |token| to the result of running [=try to consume a token=] given |parser| and "`escaped-char`". + 1. If |token| is null, then [=break=]. + 1. Append |token|'s [=token/value=] to the end of |result|. + 1. Return |result|. +
+ +
+To maybe add a part from the pending fixed value given a [=pattern parser=] |parser|: + + 1. If |parser|'s [=pattern parser/pending fixed value=] is the empty string, then return. + 1. Let |encoded value| be the result of running |parser|'s [=pattern parser/encoding callback=] given |parser|'s [=pattern parser/pending fixed value=]. + 1. Set |parser|'s [=pattern parser/pending fixed value=] to the empty string. + 1. Let |part| be a new [=part=] whose [=part/type=] is "`fixed-text`", [=part/value=] is |encoded value|, and [=part/modifier=] is "`none`". + 1. [=list/Append=] |part| to |parser|'s [=pattern parser/part list=]. +
+ +
+To add a part given a [=pattern parser=] |parser|, a string |prefix|, a [=token=] |name token|, a [=token=] |regexp or wildcard token|, a string |suffix|, and a [=token=] |modifier token|: + + 1. Let |modifier| be "`none`". + 1. If |modifier token| is not null: + 1. If |modifier token|'s [=token/value=] is "`?`" then set |modifier| to "`optional`". + 1. Otherwise if |modifier token|'s [=token/value=] is "`*`" then set |modifier| to "`zero-or-more`". + 1. Otherwise if |modifier token|'s [=token/value=] is "`+`" then set |modifier| to "`one-or-more`". + 1. If |name token| is null and |regexp or wildcard token| is null and |modifier| is "`none`": +

This was a "`{foo}`" grouping. We add this to the [=pattern parser/pending fixed value=] so that it will be combined with any previous or subsequent text.

+ 1. Append |prefix| to the end of |parser|'s [=pattern parser/pending fixed value=]. + 1. Return. + 1. Run [=maybe add a part from the pending fixed value=] given |parser|. + 1. If |name token| is null and |regexp or wildcard token| is null: +

This was a "`{foo}?`" grouping. The modifier means we cannot combine it with other text. Therefore we add it as a [=part=] immediately.

+ 1. [=Assert=]: |suffix| is the empty string. + 1. If |prefix| is the empty string, then return. + 1. Let |encoded value| be the result of running |parser|'s [=pattern parser/encoding callback=] given |prefix|. + 1. Let |part| be a new [=part=] whose [=part/type=] is "`fixed-text`", [=part/value=] is |encoded value|, and [=part/modifier=] is |modifier|. + 1. [=list/Append=] |part| to |parser|'s [=pattern parser/part list=]. + 1. Return. + 1. Let |regexp value| be the empty string. +

Next, we convert the |regexp or wildcard token| into a regular expression. + 1. If |regexp or wildcard token| is null, then set |regexp value| to |parser|'s [=pattern parser/segment wildcard regexp=]. + 1. Otherwise if |regexp or wildcard token|'s [=token/type=] is "`asterisk`", then set |regexp value| to the [=full wildcard regexp value=]. + 1. Otherwise set |regexp value| to |regexp or wildcard token|'s [=token/value=]. + 1. Let |type| be "`regexp`". +

Next, we convert |regexp value| into a [=part=] [=part/type=]. We make sure to go to a regular expression first so that an equivalent "`regexp`" [=token=] will be treated the same as a "`name`" or "`asterisk`" [=token=].

+ 1. If |regexp value| is |parser|'s [=pattern parser/segment wildcard regexp=]: + 1. Set |type| to "`segment-wildcard`". + 1. Set |regexp value| to the empty string. + 1. Otherwise if |regexp value| is the [=full wildcard regexp value=]: + 1. Set |type| to "`full-wildcard`". + 1. Set |regexp value| to the empty string. + 1. Let |name| be the empty string. +

Next, we determine the [=part=] [=part/name=]. This can be explicitly provided by a "`name`" [=token=] or be automatically assigned. + 1. If |name token| is not null, then set |name| to |name token|'s [=token/value=]. + 1. Otherwise if |regexp or wildcard token| is not null: + 1. Set |name| to |parser|'s [=pattern parser/next numeric name=], [=serialize an integer|serialized=]. + 1. Increment |parser|'s [=pattern parser/next numeric name=] by 1. + 1. If the result of running [=is a duplicate name=] given |parser| and |name| is true, then throw a {{TypeError}}. + 1. Let |encoded prefix| be the result of running |parser|'s [=pattern parser/encoding callback=] given |prefix|. +

Finally, we encode the fixed text values and create the [=part=]. + 1. Let |encoded suffix| be the result of running |parser|'s [=pattern parser/encoding callback=] given |suffix|. + 1. Let |part| be a new [=part=] whose [=part/type=] is |type|, [=part/value=] is |regexp value|, [=part/modifier=] is |modifier|, [=part/name=] is |name|, [=part/prefix=] is |encoded prefix|, and [=part/suffix=] is |encoded suffix|. + 1. [=list/Append=] |part| to |parser|'s [=pattern parser/part list=]. +

+ +
+To determine if a value is a duplicate name given a [=pattern parser=] |parser| and a string |name|: + + 1. [=list/For each=] |part| of |parser|'s [=pattern parser/part list=]: + 1. If |part|'s [=part/name=] is |name|, then return true. + 1. Return false. +
+ +

Converting part lists to regular expressions

+ +
+To generate a regular expression and name list from a given [=/part list=] |part list| and [=/options=] |options|: + + 1. Let |result| be "`^`". + 1. Let |name list| be a new [=list=]. + 1. [=list/For each=] |part| of |part list|: + 1. If |part|'s [=part/type=] is "`fixed-text`": + 1. If |part|'s [=part/modifier=] is "`none`", then append the result of running [=escape a regexp string=] given |part|'s [=part/value=] to the end of |result|. + 1. Otherwise: +
+

A "`fixed-text`" |part| with a modifier uses a non capturing group. It uses the following form. +

`(?:)` +

+ 1. Append "`(?:`" to the end of |result|. + 1. Append the result of running [=escape a regexp string=] given |part|'s [=part/value=] to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. Append the result of running [=convert a modifier to a string=] given |part|'s [=part/modifier=] to the end of |result|. + 1. [=Continue=]. + 1. [=Assert=]: |part|'s [=part/name=] is not the empty string. + 1. [=list/Append=] |part|'s [=part/name=] to |name list|. +

We collect the list of matching group names in a parallel list. This is largely done for legacy reasons to match [path-to-regexp](https://github.com/pillarjs/path-to-regexp). We could attempt to convert this to use regular expression named captured groups, but given the complexity of this algorithm there is a real risk of introducing unintended bugs. In addition, if we ever end up exposing the generated regular expressions to the web we would like to maintain compability with [path-to-regexp](https://github.com/pillarjs/path-to-regexp) which has indicated its unlikely to switch to using named capture groups. + 1. Let |regexp value| be |part|'s [=part/value=]. + 1. If |part|'s [=part/type=] is "`segment-wildcard`", then set |regexp value| to the result of running [=generate a segment wildcard regexp=] given |options|. + 1. Otherwise if |part|'s [=part/type=] is "`full-wildcard`", then set |regexp value| to [=full wildcard regexp value=]. + 1. If |part|'s [=part/prefix=] is the empty string and |part|'s [=part/suffix=] is the empty string: +

+

If there is no [=part/prefix=] or [=part/suffix=] then generation depends on the modifier. If there is no modifier or just the optional modifier, it uses the following simple form: +

`()` +

If there is a repeating modifier, however, we will use the more complex form: +

`((?:))` +

+ 1. If |part|'s [=part/modifier=] is "`none`" or "`optional`", then: + 1. Append "`(`" to the end of |result|. + 1. Append |regexp value| to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. Append the result of running [=convert a modifier to a string=] given |part|'s [=part/modifier=] to the end of |result|. + 1. Otherwise: + 1. Append "`((?:`" to the end of |result|. + 1. Append |regexp value| to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. Append the result of running [=convert a modifier to a string=] given |part|'s [=part/modifier=] to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. [=Continue=]. + 1. If |part|'s [=part/modifier=] is "`none`" or "`optional`": +
+

This section handles non-repeating parts with a [=part/prefix=] or [=part/suffix=]. There is an inner capturing group that contains the primary |regexp value|. The inner group is then combined with the [=part/prefix=] or [=part/suffix=] in an outer non-capturing group. Finally the modifier is applied. The resulting form is as follows. +

`(?:())` +

+ 1. Append "`(?:`" to the end of |result|. + 1. Append the result of running [=escape a regexp string=] given |part|'s [=part/prefix=] to the end of |result|. + 1. Append "`(`" to the end of |result|. + 1. Append |regexp value| to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. Append the result of running [=escape a regexp string=] given |part|'s [=part/suffix=] to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. Append the result of running [=convert a modifier to a string=] given |part|'s [=part/modifier=] to the end of |result|. + 1. [=Continue=]. + 1. [=Assert=]: |part|'s [=part/modifier=] is "`zero-or-more`" or "`one-or-more`". + 1. [=Assert=]: |part|'s [=part/prefix=] is not the empty string or |part|'s [=part/suffix=] is not the empty string. +
+

Repeating parts with a [=part/prefix=] or [=part/suffix=] are dramatically more complicated. We want to exclude the initial [=part/prefix=] and the final [=part/suffix=], but include them between any repeated elements. To achieve this we provide a separate initial expression that excludes the [=part/prefix=]. Then the expression is duplicated with the [=part/prefix=]/[=part/suffix=] values included in an optional repeating element. If zero values are permitted then a final optional modifier can be appended. The resulting form is as follows. +

`(?:((?:)(?:(?:))*))?` +

+ 1. Append "`(?:`" to the end of |result|. + 1. Append the result of running [=escape a regexp string=] given |part|'s [=part/prefix=] to the end of |result|. + 1. Append "`((?:`" to the end of |result|. + 1. Append |regexp value| to the end of |result|. + 1. Append "`)(?:`" to the end of |result|. + 1. Append the result of running [=escape a regexp string=] given |part|'s [=part/suffix=] to the end of |result|. + 1. Append the result of running [=escape a regexp string=] given |part|'s [=part/prefix=] to the end of |result|. + 1. Append "`(?:`" to the end of |result|. + 1. Append |regexp value| to the end of |result|. + 1. Append "`))*)`" to the end of |result|. + 1. Append the result of running [=escape a regexp string=] given |part|'s [=part/suffix=] to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. If |part|'s [=part/modifier=] is "`zero-or-more`" then append "`?`" to the end of |result|. + 1. Append "`$`" to the end of |result|. + 1. Return (|result|, |name list|). +
+ +
+To escape a regexp string given a string |input|: + + 1. [=Assert=]: |input| is an [=ASCII string=]. + 1. Let |result| be the empty string. + 1. Let |index| be 0. + 1. While |index| is less than |input|'s [=string/length=]: + 1. Let |c| be |input|[|index|]. + 1. Increment |index| by 1. + 1. If |c| is one of: + +

then append "\" to the end of |result|. + 1. Append |c| to the end of |result|. + 1. Return |result|. +

+ +

Converting part lists to pattern strings

+ +
+To generate a [=/pattern string=] from a given [=/part list=] |part list| and [=/options=] |options|: + + 1. Let |result| be the empty string. + 1. Let |index list| be the result of [=list/getting the indices=] for |part list|. + 1. [=list/For each=] |index| of |index list|: + 1. Let |part| be |part list|[|index|]. + 1. Let |previous part| be |part list|[|index| - 1] if |index| is greater than 0, otherwise let it be null. + 1. Let |next part| be |part list|[|index| + 1] if |index| is less than |index list|'s [=list/size=] - 1, otherwise let it be null. + 1. If |part|'s [=part/type=] is "`fixed-text`" then: + 1. If |part|'s [=part/modifier=] is "`none`" then: + 1. Append the result of running [=escape a pattern string=] given |part|'s [=part/value=] to the end of |result|. + 1. [=Continue=]. + 1. Append "`{`" to the end of |result|. + 1. Append the result of running [=escape a pattern string=] given |part|'s [=part/value=] to the end of |result|. + 1. Append "`}`" to the end of |result|. + 1. Append the result of running [=convert a modifier to a string=] given |part|'s [=part/modifier=] to the end of |result|. + 1. [=Continue=]. + 1. Let |custom name| be true if |part|'s [=part/name=][0] is not an [=ASCII digit=]; otherwise false. + 1. Let |needs grouping| be true if at least one of the following are true, otherwise let it be false: + + 1. If all of the following are true: + + then: + 1. If |next part|'s [=part/type=] is "`fixed-text`": + 1. Set |needs grouping| to true if the result of running [=is a valid name code point=] given |next part|'s [=part/value=]'s first [=/code point=] and the boolean false is true. + 1. Otherwise: + 1. Set |needs grouping| to true if |next part|'s [=part/name=][0] is an [=ASCII digit=]. + 1. If all of the following are true: + + then set |needs grouping| to true. + 1. [=Assert=]: |part|'s [=part/name=] is not the empty string or null. + 1. If |needs grouping| is true, then append "`{`" to the end of |result|. + 1. Append the result of running [=escape a pattern string=] given |part|'s [=part/prefix=] to the end of |result|. + 1. If |custom name| is true: + 1. Append "`:`" to the end of |result|. + 1. Append |part|'s [=part/name=] to the end of |result|. + 1. If |part|'s [=part/type=] is "`regexp`" then: + 1. Append "`(`" to the end of |result|. + 1. Append |part|'s [=part/value=] to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. Otherwise if |part|'s [=part/type=] is "`segment-wildcard`" and |custom name| is false: + 1. Append "`(`" to the end of |result|. + 1. Append the result of running [=generate a segment wildcard regexp=] given |options| to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. Otherwise if |part|'s [=part/type=] is "`full-wildcard`": + 1. If |custom name| is false and one of the following is true: + + then append "`*`" to the end of |result|. + 1. Otherwise: + 1. Append "`(`" to the end of |result|. + 1. Append [=full wildcard regexp value=] to the end of |result|. + 1. Append "`)`" to the end of |result|. + 1. If all of the following are true: + + then append U+005C (\) to the end of |result|. + 1. Append the result of running [=escape a pattern string=] given |part|'s [=part/suffix=] to the end of |result|. + 1. If |needs grouping| is true, then append "`}`" to the end of |result|. + 1. Append the result of running [=convert a modifier to a string=] given |part|'s [=part/modifier=] to the end of |result|. + 1. Return |result|. +
+ +
+To escape a pattern string given a string |input|: + + 1. [=Assert=]: |input| is an [=ASCII string=]. + 1. Let |result| be the empty string. + 1. Let |index| be 0. + 1. While |index| is less than |input|'s [=string/length=]: + 1. Let |c| be |input|[|index|]. + 1. Increment |index| by 1. + 1. If |c| is one of: + +

then append U+005C (\) to the end of |result|. + 1. Append |c| to the end of |result|. + 1. Return |result|. +

+ +
+To convert a modifier to a string given a [=part/modifier=] |modifier|: + + 1. If |modifier| is "`zero-or-more`", then return "`*`". + 1. If |modifier| is "`optional`", then return "`?`". + 1. If |modifier| is "`one-or-more`", then return "`+`". + 1. Return the empty string. +
+ +

Canonicalization

+ +

Encoding callbacks

+ +
+ To canonicalize a protocol given a string |value|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. Let |parseResult| be the result of running the [=basic URL parser=] given |value| followed by "`://dummy.test`", with |dummyURL| as [=basic URL parser/url=]. +

Note, [=basic URL parser/state override=] is not used here because it enforces restrictions that are only appropriate for the {{URL/protocol}} setter. Instead we use the protocol to parse a dummy URL using the normal parsing entry point.

+ 1. If |parseResult| is failure, then throw a {{TypeError}}. + 1. Return |dummyURL|'s [=url/scheme=]. +
+ +
+ To canonicalize a username given a string |value|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. [=Set the username=] given |dummyURL| and |value|. + 1. Return |dummyURL|'s [=url/username=]. +
+ +
+ To canonicalize a password given a string |value|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. [=Set the password=] given |dummyURL| and |value|. + 1. Return |dummyURL|'s [=url/password=]. +
+ +
+ To canonicalize a hostname given a string |value|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. Let |parseResult| be the result of running the [=basic URL parser=] given |value| with |dummyURL| as [=basic URL parser/url=] and [=hostname state=] as [=basic URL parser/state override=]. + 1. If |parseResult| is failure, then throw a {{TypeError}}. + 1. Return |dummyURL|'s [=url/host=]. +
+ +
+ To canonicalize an IPv6 hostname given a string |value|: + + 1. Let |result| be the empty string. + 1. [=list/For each=] |code point| in |value| interpreted as a [=list=] of [=/code points=]: + 1. If all of the following are true: + +

then throw a {{TypeError}}. + 1. Append the result of running [=ASCII lowercase=] given |code point| to the end of |result|. + 1. Return |result|. +

+ +
+ To canonicalize a port given a string |portValue| and optionally a string |protocolValue|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. If |protocolValue| was given, then set |dummyURL|'s [=url/scheme=] to |protocolValue|. +

Note, we set the [=URL record=]'s [=url/scheme=] in order for the [=basic URL parser=] to recognize and normalize default port values.

+ 1. Let |parseResult| be the result of running [=basic URL parser=] given |portValue| with |dummyURL| as [=basic URL parser/url=] and [=port state=] as [=basic URL parser/state override=]. + 1. If |parseResult| is failure, then throw a {{TypeError}}. + 1. Return |dummyURL|'s [=url/port=], [=serialize an integer|serialized=], or empty string if it is null. +
+ +
+ To canonicalize a pathname given a string |value|: + + 1. If |value| is the empty string, then return |value|. + 1. Let |leading slash| be true if the first [=/code point=] in |value| is U+002F (`/`) and otherwise false. + 1. Let |modified value| be "`/-`" if |leading slash| is false and otherwise the empty string. +
+

The URL parser will automatically prepend a leading slash to the canonicalized pathname. This does not work here unfortunately. This algorithm is called for pieces of the pathname, instead of the entire pathname, when used as an encoding callback. Therefore we disable the prepending of the slash by inserting our own. An additional character is also inserted here in order to avoid inadvertantly collapsing a leading dot due to the fake leading slash being interpreted as a "`/.`" sequence. These inserted characters are then removed from the result below. +

Note, implementations are free to simply disable slash prepending in their URL parsing code instead of paying the performance penalty of inserting and removing characters in this algorithm. +

+ 1. Append |value| to the end of |modified value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. Let |parseResult| be the result of running [=basic URL parser=] given |modified value| with |dummyURL| as [=basic URL parser/url=] and [=path start state=] as [=basic URL parser/state override=]. + 1. If |parseResult| is failure, then throw a {{TypeError}}. + 1. Let |result| be the result of [=URL path serializing=] |dummyURL|. + 1. If |leading slash| is false, then set |result| to the [=code point substring to the end of the string|code point substring=] from 2 to the end of the string within |result|. + 1. Return |result|. +
+ +
+ To canonicalize an opaque pathname given a string |value|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. Set |dummyURL|'s [=url/path=] to the empty string. + 1. Let |parseResult| be the result of running [=basic URL parser|URL parsing=] given |value| with |dummyURL| as [=basic URL parser/url=] and [=basic URL parser/opaque path state=] as [=basic URL parser/state override=]. + 1. If |parseResult| is failure, then throw a {{TypeError}}. + 1. Return the result of [=URL path serializing=] |dummyURL|. +
+ +
+ To canonicalize a search given a string |value|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. Set |dummyURL|'s [=url/query=] to the empty string. + 1. Let |parseResult| be the result of running [=basic URL parser=] given |value| with |dummyURL| as [=basic URL parser/url=] and [=query state=] as [=basic URL parser/state override=]. + 1. If |parseResult| is failure, then throw a {{TypeError}}. + 1. Return |dummyURL|'s [=url/query=]. +
+ +
+ To canonicalize a hash given a string |value|: + + 1. If |value| is the empty string, return |value|. + 1. Let |dummyURL| be a new [=URL record=]. + 1. Set |dummyURL|'s [=url/fragment=] to the empty string. + 1. Let |parseResult| be the result of running [=basic URL parser=] given |value| with |dummyURL| as [=basic URL parser/url=] and [=fragment state=] as [=basic URL parser/state override=]. + 1. If |parseResult| is failure, then throw a {{TypeError}}. + 1. Return |dummyURL|'s [=url/fragment=]. +
+ +

{{URLPatternInit}} processing

+ +
+ To process a URLPatternInit given a {{URLPatternInit}} |init|, a string |type|, a string or null |protocol|, a string or null |username|, a string or null |password|, a string or null |hostname|, a string or null |port|, a string or null |pathname|, a string or null |search|, and a string or null |hash|: + + 1. Let |result| be the result of creating a new {{URLPatternInit}}. + 1. If |protocol| is not null, [=map/set=] |result|["{{URLPatternInit/protocol}}"] to |protocol|. + 1. If |username| is not null, [=map/set=] |result|["{{URLPatternInit/username}}"] to |username|. + 1. If |password| is not null, [=map/set=] |result|["{{URLPatternInit/password}}"] to |password|. + 1. If |hostname| is not null, [=map/set=] |result|["{{URLPatternInit/hostname}}"] to |hostname|. + 1. If |port| is not null, [=map/set=] |result|["{{URLPatternInit/port}}"] to |port|. + 1. If |pathname| is not null, [=map/set=] |result|["{{URLPatternInit/pathname}}"] to |pathname|. + 1. If |search| is not null, [=map/set=] |result|["{{URLPatternInit/search}}"] to |search|. + 1. If |hash| is not null, [=map/set=] |result|["{{URLPatternInit/hash}}"] to |hash|. + 1. Let |baseURL| be null. + 1. If |init|["{{URLPatternInit/baseURL}}"] [=map/exists=]: +
+ The base URL can be used to supply additional context, but for each component, if |init| includes a component which is at least as specific as one in the base URL, none is inherited. + + A component is more specific if it appears later in one of the following two lists (which are very similar to the order they appear in the URL syntax): + + * protocol, hostname, port, pathname, search, hash + * protocol, hostname, port, username, password + + Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL when parsing a URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) +
+ 1. Set |baseURL| to the result of [=URL parser|parsing=] |init|["{{URLPatternInit/baseURL}}"]. + 1. If |baseURL| is failure, then throw a {{TypeError}}. + 1. If |init|["{{URLPatternInit/protocol}}"] does not [=map/exist=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/scheme=] and |type|. + 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}" and "{{URLPatternInit/username}}", then set |result|["{{URLPatternInit/username}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/username=] and |type|. + 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}", "{{URLPatternInit/username}}" and "{{URLPatternInit/password}}", then set |result|["{{URLPatternInit/password}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/password=] and |type|. + 1. If |init| [=map/contains=] neither "{{URLPatternInit/protocol}}" nor "{{URLPatternInit/hostname}}", then: + 1. Let |baseHost| be |baseURL|'s [=url/host=]. + 1. If |baseHost| is null, then set |baseHost| to the empty string. + 1. Set |result|["{{URLPatternInit/hostname}}"] to the result of [=processing a base URL string=] given |baseHost| and |type|. + 1. If |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", and "{{URLPatternInit/port}}", then: + 1. If |baseURL|'s [=url/port=] is null, then set |result|["{{URLPatternInit/port}}"] to the empty string. + 1. Otherwise, set |result|["{{URLPatternInit/port}}"] to |baseURL|'s [=url/port=], [=serialize an integer|serialized=]. + 1. If |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}", and "{{URLPatternInit/pathname}}", then set |result|["{{URLPatternInit/pathname}}"] to the result of [=processing a base URL string=] given the result of [=URL path serializing=] |baseURL| and |type|. + 1. If |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}", "{{URLPatternInit/pathname}}", and "{{URLPatternInit/search}}", then: + 1. Let |baseQuery| be |baseURL|'s [=url/query=]. + 1. If |baseQuery| is null, then set |baseQuery| to the empty string. + 1. Set |result|["{{URLPatternInit/search}}"] to the result of [=processing a base URL string=] given |baseQuery| and |type|. + 1. If |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}", "{{URLPatternInit/pathname}}", "{{URLPatternInit/search}}", and "{{URLPatternInit/hash}}", then: + 1. Let |baseFragment| be |baseURL|'s [=url/fragment=]. + 1. If |baseFragment| is null, then set |baseFragment| to the empty string. + 1. Set |result|["{{URLPatternInit/hash}}"] to the result of [=processing a base URL string=] given |baseFragment| and |type|. + 1. If |init|["{{URLPatternInit/protocol}}"] [=map/exists=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=process protocol for init=] given |init|["{{URLPatternInit/protocol}}"] and |type|. + 1. If |init|["{{URLPatternInit/username}}"] [=map/exists=], then set |result|["{{URLPatternInit/username}}"] to the result of [=process username for init=] given |init|["{{URLPatternInit/username}}"] and |type|. + 1. If |init|["{{URLPatternInit/password}}"] [=map/exists=], then set |result|["{{URLPatternInit/password}}"] to the result of [=process password for init=] given |init|["{{URLPatternInit/password}}"] and |type|. + 1. If |init|["{{URLPatternInit/hostname}}"] [=map/exists=], then set |result|["{{URLPatternInit/hostname}}"] to the result of [=process hostname for init=] given |init|["{{URLPatternInit/hostname}}"] and |type|. + 1. If |init|["{{URLPatternInit/port}}"] [=map/exists=], then set |result|["{{URLPatternInit/port}}"] to the result of [=process port for init=] given |init|["{{URLPatternInit/port}}"], |result|["{{URLPatternInit/protocol}}"], and |type|. + 1. If |init|["{{URLPatternInit/pathname}}"] [=map/exists=]: + 1. Set |result|["{{URLPatternInit/pathname}}"] to |init|["{{URLPatternInit/pathname}}"]. + 1. If the following are all true: + +

then: + 1. Let |baseURLPath| be the result of running [=process a base URL string=] given the result of [=URL path serializing=] |baseURL| and |type|. + 1. Let |slash index| be the index of the last U+002F (`/`) code point found in |baseURLPath|, interpreted as a sequence of [=/code points=], or null if there are no instances of the code point. + 1. If |slash index| is not null: + 1. Let |new pathname| be the [=code point substring by positions|code point substring=] from 0 to |slash index| + 1 within |baseURLPath|. + 1. Append |result|["{{URLPatternInit/pathname}}"] to the end of |new pathname|. + 1. Set |result|["{{URLPatternInit/pathname}}"] to |new pathname|. + 1. Set |result|["{{URLPatternInit/pathname}}"] to the result of [=process pathname for init=] given |result|["{{URLPatternInit/pathname}}"], |result|["{{URLPatternInit/protocol}}"], and |type|. + 1. If |init|["{{URLPatternInit/search}}"] [=map/exists=] then set |result|["{{URLPatternInit/search}}"] to the result of [=process search for init=] given |init|["{{URLPatternInit/search}}"] and |type|. + 1. If |init|["{{URLPatternInit/hash}}"] [=map/exists=] then set |result|["{{URLPatternInit/hash}}"] to the result of [=process hash for init=] given |init|["{{URLPatternInit/hash}}"] and |type|. + 1. Return |result|. +

+ +
+ To process a base URL string given a string |input| and a string |type|: + + 1. [=Assert=]: |input| is not null. + 1. If |type| is not "`pattern`" return |input|. + 1. Return the result of [=escaping a pattern string=] given |input|. +
+ +
+ To run is an absolute pathname given a [=/pattern string=] |input| and a string |type|: + + 1. If |input| is the empty string, then return false. + 1. If |input|[0] is U+002F (`/`), then return true. + 1. If |type| is "`url`", then return false. + 1. If |input|'s [=string/code point length=] is less than 2, then return false. + 1. If |input|[0] is U+005C (\) and |input|[1] is U+002F (`/`), then return true. + 1. If |input|[0] is U+007B (`{`) and |input|[1] is U+002F (`/`), then return true. + 1. Return false. +
+ +
+ To process protocol for init given a string |value| and a string |type|: + + 1. Let |strippedValue| be the given |value| with a single trailing U+003A (`:`) removed, if any. + 1. If |type| is "`pattern`" then return |strippedValue|. + 1. Return the result of running [=canonicalize a protocol=] given |strippedValue|. +
+ +
+ To process username for init given a string |value| and a string |type|: + + 1. If |type| is "`pattern`" then return |value|. + 1. Return the result of running [=canonicalize a username=] given |value|. +
+ +
+ To process password for init given a string |value| and a string |type|: + + 1. If |type| is "`pattern`" then return |value|. + 1. Return the result of running [=canonicalize a password=] given |value|. +
+ +
+ To process hostname for init given a string |value| and a string |type|: + + 1. If |type| is "`pattern`" then return |value|. + 1. Return the result of running [=canonicalize a hostname=] given |value|. +
+ +
+ To process port for init given a string |portValue|, a string |protocolValue|, and a string |type|: + + 1. If |type| is "`pattern`" then return |portValue|. + 1. Return the result of running [=canonicalize a port=] given |portValue| and |protocolValue|. +
+ +
+ To process pathname for init given a string |pathnameValue|, a string |protocolValue|, and a string |type|: + + 1. If |type| is "`pattern`" then return |pathnameValue|. + 1. If |protocolValue| is a [=special scheme=] or the empty string, then return the result of running [=canonicalize a pathname=] given |pathnameValue|. +

If the |protocolValue| is the empty string then no value was provided for {{URLPatternInit/protocol}} in the constructor dictionary. Normally we do not special case empty string dictionary values, but in this case we treat it as a [=special scheme=] in order to default to the most common pathname canonicalization. + 1. Return the result of running [=canonicalize an opaque pathname=] given |pathnameValue|. +

+ +
+ To process search for init given a string |value| and a string |type|: + + 1. Let |strippedValue| be the given |value| with a single leading U+003F (`?`) removed, if any. + 1. If |type| is "`pattern`" then return |strippedValue|. + 1. Return the result of running [=canonicalize a search=] given |strippedValue|. +
+ +
+ To process hash for init given a string |value| and a string |type|: + + 1. Let |strippedValue| be the given |value| with a single leading U+0023 (`#`) removed, if any. + 1. If |type| is "`pattern`" then return |strippedValue|. + 1. Return the result of running [=canonicalize a hash=] given |strippedValue|. +
+ +

Using URL patterns in other specifications

+ +To promote consistency on the web platform, other documents integrating with this specification should adhere to the following guidelines, unless there is good reason to diverge. + +1. **Accept shorthands**. Most author patterns will be simple and straightforward. Accordingly, APIs should accept shorthands for those common cases and avoid the need for authors to take additional steps to transform these into complete {{URLPattern}} objects. +1. **Respect the base URL**. Just as URLs are generally parsed relative to a base URL for their environment (most commonly, a [=document base URL=]), URL patterns should respect this as well. The {{URLPattern}} constructor itself is an exception because it directly exposes the concept itself, similar to how the URL constructor does not respect the base URL even though the rest of the platform does. +1. **Be clear about regexp groups**. Some APIs may benefit from only allowing URL patterns which do not [=URL pattern/has regexp groups|have regexp groups=], for example, because user agents are likely to implement them in a different thread or process from those executing author script, and because of security or performance concerns, a JavaScript engine would not ordinarily run there. If so, this should be clearly documented (with reference to [=URL pattern/has regexp groups=]) and the operation should report an error as soon as possible (e.g., by throwing a JavaScript exception). If possible, this should be feature-detectable to allow for the possibility of this constraint being lifted in the future. Avoid creating different subsets of URL patterns without consulting the editors of this specification. +1. **Be clear about what URLs will be matched**. For instance, algorithms during fetching are likely to operate on URLs with no [=url/fragment=]. If so, the specification should be clear that this is the case, and may advise showing a developer warning if a pattern which cannot match (e.g., because it requires a non-empty fragment) is used. + +

Integrating with JavaScript APIs

+ + +typedef (USVString or URLPatternInit or URLPattern) URLPatternCompatible; + + +JavaScript APIs should accept all of: +* a {{URLPattern}} object +* a dictionary-like object which specifies the components required to construct a pattern +* a string (in the constructor string syntax) + +To accomplish this, specifications should accept {{URLPatternCompatible}} as an argument to an [=operation=] or [=dictionary member=], and process it using the following algorithm, using the appropriate [=environment settings object=]'s [=environment settings object/API base URL=] or equivalent. + +
+ To build a {{URLPattern}} object from a Web IDL value {{URLPatternCompatible}} |input| given [=/URL=] |baseURL| and [=ECMAScript/realm=] |realm|, perform the following steps: + + 1. If the [=specific type=] of |input| is {{URLPattern}}: + 1. Return |input|. + 1. Otherwise: + 1. Let |pattern| be a [=new=] {{URLPattern}} with |realm|. + 1. Set |pattern|'s [=URLPattern/associated URL pattern=] to the result of [=building a URL pattern from a Web IDL value=] given |input| and |baseURL|. + 1. Return |pattern|. +
+ +
+ To build a [=URL pattern=] from a Web IDL value {{URLPatternCompatible}} |input| given [=/URL=] |baseURL|, perform the following steps: + + 1. If the [=specific type=] of |input| is {{URLPattern}}: + 1. Return |input|'s [=URLPattern/associated URL pattern=]. + 1. Otherwise, if the [=specific type=] of |input| is {{URLPatternInit}}: + 1. Let |init| be a [=map/clone=] of |input|. + 1. If |init|["{{URLPatternInit/baseURL}}"] does not [=map/exist=], set it to the [=URL serializer|serialization=] of |baseURL|. + 1. Return the result of [=creating=] a URL pattern given |init|, null, and an empty [=map=]. + 1. Otherwise: + 1. [=Assert=]: The [=specific type=] of |input| is {{USVString}}. + 1. Return the result of [=creating=] a URL pattern given |input|, the [=URL serializer|serialization=] of |baseURL|, and an empty [=map=]. +
+ +This allows authors to concisely specify most patterns, and use the constructor to access uncommon options if necessary. The implicit use of the base URL is similar to, and consistent with, HTML's [=parse a URL=] algorithm. [[HTML]] + +

Integrating with JSON data formats

+ +JSON data formats which include URL patterns should mirror the behavior of JavaScript APIs and accept both: +* an object which specifies the components required to construct a pattern +* a string (in the constructor string syntax) + +If a specification has an Infra value (e.g., after using [=parse a JSON string to an Infra value=]), use the following algorithm, using the appropriate base URL (by default, the URL of the JSON resource). [[INFRA]] + +
+ To build a [=URL pattern=] from an Infra value |rawPattern| given [=/URL=] |baseURL|, perform the following steps. + + 1. Let |serializedBaseURL| be the [=URL serializer|serialization=] of |baseURL|. + 1. If |rawPattern| is a [=string=], then: + 1. Return the result of [=creating=] a URL pattern given |rawPattern|, |serializedBaseURL|, and an empty [=map=]. + +
It might become necessary in the future to plumb non-empty options here.
+ + 1. Otherwise, if |rawPattern| is a [=map=], then: + 1. Let |init| be «[ "{{URLPatternInit/baseURL}}" → |serializedBaseURL| ]», representing a dictionary of type {{URLPatternInit}}. + 1. [=map/For each=] |key| → |value| of |rawPattern|: + 1. If |key| is not the identifier of a dictionary member of {{URLPatternInit}} or one of its inherited dictionaries, |value| is not a [=string=], or the member's type is not declared to be {{USVString}}, then return null. + +
This will need to be updated if {{URLPatternInit}} gains members of other types.
+
A future version of this specification might also have a less strict mode, if that proves useful to other specifications.
+ 1. Set |init|[|key|] to |value|. + 1. Return the result of [=creating=] a URL pattern given |init|, null, and an empty [=map=]. + +
It might become necessary in the future to plumb non-empty options here.
+ + 1. Otherwise, return null. + +
+ +Specifications may wish to leave room in their formats to accept options for {{URLPatternOptions}}, override the base URL, or similar, since it is not possible to construct a {{URLPattern}} object directly in this case, unlike in a JavaScript API. For example, Speculation Rules accepts a "`relative_to`" key which can be used to switch to using the [=document base URL=] instead of the JSON resource's URL. [[SPECULATION-RULES]] + +

Acknowledgments

+ +The editors would like to thank +Alex Russell, +Anne van Kesteren, +Asa Kusuma, +Blake Embrey, +Cyrus Kasaaian, +Daniel Murphy, +Darwin Huang, +Devlin Cronin, +Domenic Denicola, +Dominick Ng, +Jake Archibald, +Jeffrey Posnick, +Jeremy Roman, +Jimmy Shen, +Joe Gregorio, +Joshua Bell, +Kenichi Ishibashi, +Kenji Baheux, +Kenneth Rohde Christiansen, +Kingsley Ngan, +Kinuko Yasuda, +L. David Baron, +Luca Casonato, +Łukasz Anforowicz, +Makoto Shimazu, +Marijn Kruisselbrink, +Matt Falkenhagen, +Matt Giuca, +Michael Landry, +R. Samuel Klatchko, +Rajesh Jagannathan, +Ralph Chelala, +Sangwhan Moon, +Sayan Pal, +Victor Costan, +Yoshisato Yanagisawa, and +Youenn Fablet +for their contributors to this specification. + +Special thanks to Blake Embrey and the other [pillarjs/path-to-regexp](https://github.com/pillarjs/path-to-regexp) [contributors](https://github.com/pillarjs/path-to-regexp/graphs/contributors) for building an excellent open source library that so many have found useful. + +Also, special thanks to Kenneth Rohde Christiansen for his work on the polyfill. He put in extensive work to adapt to the changing {{URLPattern}} API. + +This standard is written by +Ben Kelly ([Google](https://www.google.com/), [wanderview@chromium.org](mailto:wanderview@chromium.org)), +Jeremy Roman ([Google](https://www.google.com/), [jbroman@chromium.org](mailto:jbroman@chromium.org)), and +宍戸俊哉 (Shunya Shishido, [Google](https://www.google.com/), [sisidovski@chromium.org](mailto:sisidovski@chromium.org)). + + +

This Living Standard was originally developed in the W3C WICG, where it was available under the [W3C Software and Document License](https://www.w3.org/Consortium/Legal/2015/copyright-software-and-document). diff --git a/spec.bs b/spec.bs index ad3954a..5d0ca8a 100644 --- a/spec.bs +++ b/spec.bs @@ -3,6 +3,7 @@ Group: WHATWG H1: URL Pattern Shortname: urlpattern Text Macro: TWITTER urlpatterns +Text Macro: LATESTRD 2024-03 Abstract: The URL Pattern Standard provides a web platform primitive for matching URLs based on a convenient pattern syntax. Indent: 2 Markup Shorthands: markdown yes