Skip to content

Latest commit

 

History

History
1047 lines (819 loc) · 26.7 KB

P1260R0.md

File metadata and controls

1047 lines (819 loc) · 26.7 KB
title document date audience author toc
Pattern Matching
P1260R0
2018-05-22
Evolution
name email
Michael Park
true

Introduction

As algebraic data types gain better support in C++ with facilities such as tuple and variant, the importance of mechanisms to interact with them have increased. While mechanisms such as apply and visit have been added, their usage is quite complex and limited even for simple cases. Pattern matching is a widely adopted mechanism across many programming languages to interact with algebraic data types that can help greatly simplify C++. Examples of programming languages include text-based languages such as SNOBOL back in the 1960s, functional languages such as Haskell and OCaml, and "mainstream" languages such as Scala, Swift, and Rust.

Inspired by [@P0095R1] --- which proposed pattern matching and language-level variant simultaneously --- this paper explores a possible direction for pattern matching only, and does not address language-level variant design. This is in correspondence with a straw poll from Kona 2015, which encouraged exploration of a full solution for pattern matching. SF: 16, WF: 6, N: 5, WA: 1, SA: 0.

Motivation and Scope

Virtually every program involves branching on some predicates applied to a value and conditionally binding names to some of its components for use in subsequent logic. Today, C++ provides two types of selection statements: the if statement and the switch statement.

Since switch statements can only operate on a single integral value and if statements operate on an arbitrarily complex boolean expression, there is a significant gap between the two constructs even in inspection of the "vocabulary types" provided by the standard library.

In C++17, structured binding declarations [@P0144R2] introduced the ability to concisely bind names to components of tuple-like values. The proposed direction of this paper aims to naturally extend this notion by performing structured inspection prior to forming the structured bindings with a third selection statement: the inspect statement. The goal of the inspect statement is to bridge the gap between switch and if statements with a declarative, structured, cohesive, and composable mechanism.

\pagebreak

Before/After Comparisons

Matching Integrals

::: tonytable

Before

switch (x) {
  case 0: std::cout << "got zero";
  case 1: std::cout << "got one";
  default: std::cout << "don't care";
}

After

inspect (x) {
  0: std::cout << "got zero";
  1: std::cout << "got one";
  _: std::cout << "don't care";
}

:::

Matching Strings

::: tonytable

Before

if (s == "foo") {
  std::cout << "got foo";
} else if (s == "bar") {
  std::cout << "got bar";
} else {
  std::cout << "don't care";
}

After

inspect (s) {
  "foo": std::cout << "got foo";
  "bar": std::cout << "got bar";
  _: std::cout << "don't care";
}

:::

Matching Tuples

::: tonytable

Before

auto&& [x, y] = p;
if (x == 0 && y == 0) {
  std::cout << "on origin";
} else if (x == 0) {
  std::cout << "on y-axis";
} else if (y == 0) {
  std::cout << "on x-axis";
} else {
  std::cout << x << ',' << y;
}

After

inspect (p) {
  [0, 0]: std::cout << "on origin";
  [0, y]: std::cout << "on y-axis";
  [x, 0]: std::cout << "on x-axis";
  [x, y]: std::cout << x << ',' << y;
}

:::

\pagebreak

Matching Variants

::: tonytable

Before

struct visitor {
  void operator()(int i) const {
    os << "got int: " << i;
  }
  void operator()(float f) const {
    os << "got float: " << f;
  }
  std::ostream& os;
};
std::visit(visitor{strm}, v);

After

inspect (v) {
  <int> i: strm << "got int: " << i;
  <float> f: strm << "got float: " << f;
}

:::

Matching Polymorphic Types

struct Shape { virtual ~Shape() = default; };
struct Circle : Shape { int radius; };
struct Rectangle : Shape { int width, height; };

::: tonytable

Before

virtual int Shape::get_area() const = 0;

int Circle::get_area() const override {
  return 3.14 * radius * radius;
}
int Rectangle::get_area() const override {
  return width * height;
}

After

int get_area(const Shape& shape) {
  inspect (shape) {
    (as<Circle>? [r]): return 3.14 * r * r;
    (as<Rectangle>? [w, h]): return w * h;
  }
}

:::

Evaluating Expressions

struct Expr;
struct Neg { std::shared_ptr<Expr> expr; };
struct Add { std::shared_ptr<Expr> lhs, rhs; };
struct Mul { std::shared_ptr<Expr> lhs, rhs; };
struct Expr : std::variant<int, Neg, Add, Mul> { using variant::variant; };

namespace std {
  template <>
  struct variant_size<Expr> : variant_size<Expr::variant> {};

  template <std::size_t I>
  struct variant_alternative<I, Expr> : variant_alternative<I, Expr::variant> {};
}

::: tonytable

Before {width=.48}

int eval(const Expr& expr) {
  struct visitor {
    int operator()(int i) const {
      return i;
    }
    int operator()(const Neg& n) const {
      return -eval(*n.expr);
    int operator()(const Add& a) const {
      return eval(*a.lhs) + eval(*a.rhs);
    }
    int operator()(const Mul& m) const {
      return eval(*m.lhs) * eval(*m.rhs);
    }
  };
  return std::visit(visitor{}, expr);
}

After {width=.53}

int eval(const Expr& expr) {
  inspect (expr) {
    <int> i: return i;
    <Neg> [e]: return -eval(*e);
    <Add> [l, r]: return eval(*l) + eval(*r);
    <Mul> [l, r]: return eval(*l) * eval(*r);
  }
}

:::

Design Overview

Basic Syntax

| inspect constexpropt ( init-statementopt condition ) { | pattern guardopt : statement | pattern guardopt : statement | ... | }

| guard: | if ( expression )

Basic Model

Within the parentheses, the inspect statement is equivalent to switch and if statements except that no conversion nor promotion takes place in evaluating the value of its condition.

When the inspect statement is executed, its condition is evaluated and matched in order (first match semantics) against each pattern. If a pattern successfully matches the value of the condition and the boolean expression in the guard evaluates to true (or if there is no guard at all), control is passed to the statement following the matched pattern label. If the guard expression evaluates to false, control flows to the subsequent pattern. If no pattern matches, none of the statements are executed.

Types of Patterns

Primary Patterns

Wildcard Pattern

The wildcard pattern has the form:

| _

and matches any value v.

int v = /* ... */;

inspect (v) {
    _: std::cout << "ignored";
//  ^ wildcard pattern
}

[Even though _ is a valid identifier, it does not introduce a name.]{.note}

Identifier Pattern

The identifier pattern has the form:

| identifier

and matches any value v. The introduced name behaves as an lvalue referring to v, and is in scope from its point of declaration until the end of the statement following the pattern label.

int v = /* ... */;

inspect (v) {
    x: std::cout << x;
//  ^ identifier pattern
}

[If the identifier pattern is used as a top-level pattern, it has the same syntax as a goto label.]{.note}

Constant Pattern

The constant pattern has the form:

| constant expression

and matches value v if a call to member c.match(v) or else a non-member ADL-only match(c, v) is contextually convertible to bool and evaluates to true where c is the constant expression.

The following is the default definition of match(x, y).

template <typename T, typename U>
constexpr auto match(T&& lhs, U&& rhs)
    -> decltype(std::forward<T>(lhs) == std::forward<U>(rhs)) {
  return std::forward<T>(lhs) == std::forward<U>(rhs);
}
int v = /* ... */;

inspect (v) {
    0: std::cout << "got zero";
    1: std::cout << "got one";
//  ^ constant pattern
}

[+id or (id) is needed to disambiguate with the identifier pattern.]{.note}

static constexpr int zero = 0, one = 1;
int v = /* ... */;

inspect (v) {
    +zero: std::cout << "got zero";
    (one): std::cout << "got one";
//  ^^^^^ constant pattern
}

Compound Patterns

Structured Binding Pattern

The structured binding pattern has the form:

| [ _pattern_0, _pattern_1, ..., _pattern_N ]

and matches value v if each patterni matches the i^th^ component of v. The components of v are given by the structured binding declaration: auto&& [__e0, __e1, ..., __eN] = v; where each __ei are unique exposition-only identifiers.

std::pair<int, int> p = /* ... */;

inspect (p) {
    [0, 0]: std::cout << "on origin";
    [0, y]: std::cout << "on y-axis";
//      ^ identifier pattern
    [x, 0]: std::cout << "on x-axis";
//      ^ constant pattern
    [x, y]: std::cout << x << ',' << y;
//  ^^^^^^ structured binding pattern
}

Alternative Pattern

The alternative pattern has the form:

| < auto > pattern | < concept > pattern | < type > pattern | < constant expression > pattern

Let v be the value being matched and V be std::remove_cvref_t<decltype(v)>.\newline Let Alt be the entity inside the angle brackets.

If std::variant_size_v<V> is well-formed and evaluates to an integral, the alternative pattern matches v if Alt is compatible with the current index of v and pattern matches the active alternative of v.

Let I be the current index of v given by a member v.index() or else a non-member ADL-only index(v). The active alternative of v is given by std::variant_alternative_t<I, V>& initialized by a member v.get<I>() or else a non-member ADL-only get<I>(v).

Alt is compatible with I if one of the following four cases is true:

  • Alt is auto
  • Alt is a concept and std::variant_alternative_t<I, V> satisfies the concept.
  • Alt is a type and std::is_same_v<Alt, std::variant_alternative_t<I, V>> is true
  • Alt is a constant expression that can be used in a switch and is the same value as I.

::: tonytable

Before {width=.53}

std::visit([&](auto&& x) {
  strm << "got auto: " << x;
}, v);

After {width=.47}

inspect (v) {
  <auto> x: strm << "got auto: " << x;
}

std::visit([&](auto&& x) {
  using X = std::remove_cvref_t<decltype(x)>;
  if constexpr (C1<X>()) {
    strm << "got C1: " << x;
  } else if constexpr (C2<X>()) {
    strm << "got C2: " << x;
  }
}, v);
inspect (v) {
  <C1> c1: strm << "got C1: " << c1;
  <C2> c2: strm << "got C2: " << c2;
}

std::visit([&](auto&& x) {
  using X = std::remove_cvref_t<decltype(x)>;
  if constexpr (std::is_same_v<int, X>) {
    strm << "got int: " << x;
  } else if constexpr (
      std::is_same_v<float, X>) {
    strm << "got float: " << x;
  }
}, v);
inspect (v) {
  <int> i: strm << "got int: " << i;
  <float> f: strm << "got float: " << f;
}

std::variant<int, int> v = /* ... */;

std::visit([&](int x) {
  strm << "got int: " << x;
}, v);
std::variant<int, int> v = /* ... */;

inspect (v) {
  <int> x: strm << "got int: " << x;
}

std::variant<int, int> v = /* ... */;

std::visit([&](auto&& x) {
  switch (v.index()) {
    case 0: {
      strm << "got first: " << x;
      break;
    }
    case 1: {
      strm << "got second: " << x;
      break;
    }
  }
}, v);
std::variant<int, int> v = /* ... */;

inspect (v) {
  <0> x: strm << "got first: " << x;
  <1> x: strm << "got second: " << x;
}

:::

Binding Pattern

The binding pattern has the form:

| identifier @ pattern

and matches value v if pattern matches it. The introduced name behaves as an lvalue referring to v, and is in scope from its point of declaration until the end of the statement following the pattern label.

std::variant<Point, /* ... */> v = /* ... */;

inspect (v) {
    <Point> p @ [x, y]: // ...
//          ^^^^^^^^^^ binding pattern
}

Extractor Pattern

The extractor pattern has the form:

| ( constant expression ? pattern )

Let e be the result of a call to member c.extract(v) or else a non-member ADL-only extract(c, v) where c is the constant expression.

The extractor pattern matches value v if e is contextually convertible to bool and evaluates to true and pattern matches *e.

struct {
    std::optional<std::array<std::string_view, 2>> extract(std::string_view sv) const;
} email;

struct {
    std::optional<std::array<std::string_view, 3>> extract(std::string_view sv) const;
} phone_number;

inspect (s) {
    (email ? [address, domain]): std::cout << "got an email";
    (phone_number ? ["415", _, _]): std::cout << "got a phone number";
//  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ extractor pattern
}

As Pattern

The as pattern is a special instance of the extractor pattern, and behaves as:

template <typename Derived>
struct As {
    template <typename Base>
    auto* extract(Base& base) const {
        static_assert(std::is_polymophic_v<Base>);
        static_assert(std::is_convertible_v<Derived*, Base*>,
                      "cross-casts are not allowed.");
        using R = /* `Derived` with the same _cv_-qualification as `Base` */;
        return dynamic_cast<R*>(&base);
    }
};

template <typename Derived>
inline constexpr As<Derived> as;

While this is a possible library implementation, it will likely benefit from being implemented as a compiler intrinsic for optimization opportunities.

[@N3449] describes techniques involving vtable pointer caching and hash conflict minimization that are implemented in the [@Mach7] library, but also mentions further opportunities available for a compiler solution.

Given the following definition of a Shape class hierarchy:

struct Shape { virtual ~Shape() = default; };

struct Circle : Shape { int radius; };
struct Rectangle : Shape { int width, height; };

::: tonytable

Before

virtual int Shape::get_area() const = 0;

int Circle::get_area() const override {
  return 3.14 * radius * radius;
}

int Rectangle::get_area() const override {
  return width * height;
}

After

int get_area(const Shape& shape) {
  inspect (shape) {
    (as<Circle>? [r]): return 3.14 * r * r;
    (as<Rectangle>? [w, h]): return w * h;
//  ^^^^^^^^^^^^^^^^^^^^^^^ as pattern
  }
}

:::

Pattern Guard

The pattern guard has the form:

| if ( expression )

Let e be the result of expression contextually converted to bool. If e is true, control is passed to the corresponding statement. Otherwise, control flows to the subsequent pattern.

The pattern guard allows to perform complex tests that cannot be performed within the pattern. For example, performing tests across multiple bindings:

inspect (p) {
    [x, y] if test(x, y): std::cout << x << ',' << y << " passed";
//         ^^^^^^^^^^^^^ pattern guard
}

This also diminishes the desire for fall-through semantics within the statements, an unpopular feature even in switch statements. For the reified semantics of the pattern guard, consider the following snippet:

switch (x) {
    case c1: if (cond1) { stmt1; break; } [[fallthrough]]
    case c2: if (cond2) { stmt2; break; } [[fallthrough]]
}

inspect constexpr

Note that every pattern is able to determine whether it matches value v as a boolean expression in isolation.

Let matches be the condition for which a pattern matches a value v. Ignoring any potential optimization opportunities, we're able to perform the following transformation:

::: tonytable

inspect {width=.4}

inspect (v) {
  pattern1 if (cond1): stmt1
  pattern2: stmt2
  // ...
}

if {width=.6}

if (pattern1 matches v && cond1) stmt1
else if (pattern2 matches v) stmt2
// ...

:::

inspect constexpr is then formulated by applying constexpr to every if branch.

::: tonytable

inspect constexpr {width=.4}

inspect constexpr (v) {
  pattern1 if (cond1): stmt1
  pattern2: stmt2
  // ...
}

if constexpr {width=.6}

if constexpr (pattern1 matches v && cond1) stmt1
else if constexpr (pattern2 matches v) stmt2
// ...

:::

Exhaustiveness Checking

The inspect statement can be declared with the [[exhaustive]] attribute to request for implementation-defined exhaustiveness checking.

Proposed Wording

The following is the beginning of an attempt at a syntactic structure.

Add to §8.4 [stmt.select] of ...

[1]{.pnum} Selection statements choose one of several flows of control.

| selection-statement: | if constexpropt ( init-statementopt condition ) statement | if constexpropt ( init-statementopt condition ) statement else statement | switch ( init-statementopt condition ) statement | [inspect constexpropt ( init-statementopt condition ) { inspect-case-seq }]{.add}

::: add | inspect-case-seq: | inspect-case | inspect-case-seq inspect-case

| inspect-case: | attribute-specifier-seqopt inspect-pattern inspect-guardopt : statement

| inspect-pattern: | wildcard-pattern | identifier-pattern | constant-pattern | structured-binding-pattern | alternative-pattern | binding-pattern | extractor-pattern

| inspect-guard: | if ( expression ) :::

Design Decisions

Extending Structured Bindings Declaration

The design is intended to be consistent and to naturally extend the notions introduced by structured bindings. That is, The subobjects are referred to rather than being assigned into new variables.

inspect rather than switch

This proposal introduces a new inspect statement rather than trying to extend the switch statement. [@P0095R0] had proposed extending switch and received feedback to "leave switch alone" in Kona 2015.

The following are some of the reasons considered:

  • switch allows the case labels to appear anywhere, which hinders the goal of pattern matching in providing structured inspection.
  • The fall-through semantics of switch generally results in break being attached to every case, and is known to be error-prone.
  • switch is purposely restricted to integrals for guaranteed efficiency. The primary goal of pattern matching in this paper is expressiveness while being at least as efficient as the naively hand-written code.

First Match rather than Best Match

The proposed matching algorithm has first match semantics. The choice of first match is mainly due to complexity. Our overload resolution rules for function declarations are extremely complex and is often a mystery.

Best match via overload resolution for function declarations are absolutely necessary due to the non-local and unordered nature of declarations. That is, function declarations live in different files and get pulled in via mechanisms such as #include and using declarations, and there is no defined order of declarations like Haskell does, for example. If function dispatching depended on the order of #include and/or using declarations being pulled in from hundreds of files, it would be a complete disaster.

Pattern matching on the other hand do not have this problem because the construct is local and ordered in nature. That is, all of the candidate patterns appear locally within inspect (x) { /* ... */ } which cannot span across multiple files, and appear in a specified order. Note that this is consistent with try/catch for the same reasons: locality and order.

Consider also the amount of limitations we face in overload resolution due to the opacity of user-defined types. T* is related to unique_ptr<T> as it is to vector<T> as far as the type system is concerned. This limitation will likely be even bigger in a pattern matching context with the amount of customization points available for user-defined behavior.

Statement rather than Expression

This paper diverges from [@P0095R1] in that it proposes to add inspect as a statement only rather than trying to double as a statement and an expression. The main reason here is that the semantic differences between the statement and expression forms are not trivial.

  • In the situation where none of the cases match, the statement form simply skips over the entire statement à la switch, whereas the expression form throws an exception since it is required to yield a value.
  • Resulting type of the statement form of inspect within an "immediately-invoked-lambda" is required to be explicitly specified, or is determined by the first return statement. In contrast, the expression form will probably need to use std::common_type_t<Ts...> where Ts... are types of N expressions to be consistent with the ternary operator.

While an expression form of inspect would be useful, the author believes that it can and should be introduced later, with different enough syntax such as x inspect { p1 => e1, p2 => e2 }. The proposed syntax of the inspect statement in this paper consistent with every other statement in C++ today.

Language rather than Library

There are three popular pattern matching libraries for C++ today: [@Mach7], [@Patterns], and [@SimpleMatch].

While the libraries have been useful for gaining experience with implementation and cleaner interfaces, the issue of introducing identifiers, syntactic overhead of the patterns, and the reduced optimization opportunities justify support as a language feature from a usability standpoint.

Optimizations

The following are few of the optimizations that are worth noting.

Structured Binding Pattern

Structured binding patterns can be optimized by performing switch over the columns with the duplicates removed, rather than the naive approach of performing a comparison per element. This removes unnecessary duplicate comparisons that would be performed otherwise. This would likely require some wording around "comparison elision" in order to enable such optimizations.

Alternative Pattern

The sequence of alternative patterns can be executed in a switch.

Open Class Hierarchy

[@N3449] describes techniques involving vtable pointer caching and hash conflict minimization that are implemented in the [@Mach7] library, but also mentions further opportunities available for a compiler solution.

Future Work

Language Support for Variant

The design of this proposal also accounts for a potential language support for variant. It achieves this by keeping the alternative pattern flexible for new extensions via < new_entity > pattern.

Consider an extension to union that allows it to be tagged by an integral, and has proper lifetime management such that the active alternative need not be destroyed manually.

// `: type` specifies the type of the underlying tag value.
union U : int { char small[32]; std::vector<char> big; };

We could then allow < qualified-id > that refers to a union alternative to support pattern matching.

U u = /* ... */;

inspect (u) {
  <U::small> s: std::cout << s;
  <U::big> b: std::cout << b;
}

The main point is that whatever entity is introduced as the discriminator, the presented form of alternative pattern should be extendable to support it.

Patterns in range-based for loop

for (auto&& [0, y] : points) {
  // only operate on points on the y-axis.
}

Structured binding declaration is allowed in range-based for loop:

for (auto&& [x, y] : points) { /* ... */ }

The [x, y] part can also be a pattern of an inspect statement rather than a structured binding declaration.

::: tonytable

Before

for (auto&& p : points) {
  auto&& [x, y] = p;
  // ...
}

After

for (auto&& p : points) {
  inspect (p) {
    [x, y]: // ...
  }
}

:::

With this model, allowing patterns directly in range-based for loop becomes natural.

::: tonytable

Code

for (auto&& [0, y] : points) {
  // only points on the y-axis.
}

Expanded

for (auto&& p : points) {
  inspect (p) {
    [0, y]: // ...
  }
  // falls through if no match
}

:::

Note on Ranges

The benefit of pattern matching for ranges is unclear. While it's possible to come up with a ranges pattern, e.g., {x, y, z} to match against a fixed-size range, it's not clear whether there is a worthwhile benefit.

The typical pattern found in functional languages of matching a range on head and tail doesn't seem to be all that common or useful in C++ since ranges are generally handled via loops rather than recursion.

Ranges likely will be best served by the range adaptors / algorithms, but further investigation is needed.

Acknowledgements

Thank you to Agustín Bergé, Ori Bernstein, Alexander Chow, Louis Dionne, Matt Calabrese, Michał Dominiak, Eric Fiselier, Zach Laine, Jason Lucas, David Sankel, Tony Van Eerd, and everyone else who contributed to the discussions, and encouraged me to write this paper.


references: