Skip to content

Commit

Permalink
Merge pull request codeplaysoftware#73 from AerialMantis/issue-50
Browse files Browse the repository at this point in the history
Issue codeplaysoftware#50: Make execution_resource iterable.
  • Loading branch information
AerialMantis authored Oct 4, 2018
2 parents 392bcd8 + 92841d8 commit 1fc49ea
Showing 1 changed file with 104 additions and 43 deletions.
147 changes: 104 additions & 43 deletions affinity/cpp-20/d0796r3.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
* Remove reference counting requirement from `execution_resource`.
* Change lifetime model of `execution_resource`: it now either consistently identifies some underlying resource, or is invalid; context creation rejects an invalid resource.ster
* Remove `this_thread::bind` & `this_thread::unbind` interfaces.
* Make `execution_resource`s iterable by replacing `execution_resource::resources` with `execution_resource::begin` and `execution_resource::end`.
* Add `size` and `operator[]` for `execution_resource`.
* Rename `this_system::get_resources` to `this_system::discover_topology`.

### P0796r2 (RAP 2018)

Expand Down Expand Up @@ -162,7 +165,13 @@ From a historic perspective, programming models for traditional high-performance
Some of these programming models also address *fault tolerance*. In particular, PVM has native support for this, providing a mechanism [[27]][pvm-callback] which can notify a program when a resource is added or removed from a system. MPI lacks a native *fault tolerance* mechanism, but there have been efforts to implement fault tolerance on top of MPI [[28]][mpi-post-failure-recovery] or by extensions[[29]][mpi-fault-tolerance].
Due to the complexity involved in standardizing *dynamic resource discovery* and *fault tolerance*, these are currently out of the scope of this paper. However, we leave open the possibility of accommodating both in the future, by not overconstraining *resources*' lifetimes (see next section).
Due to the complexity involved in standardizing *dynamic resource discovery* and *fault tolerance*, these are currently out of the scope of this paper. However, we leave open the possibility of accommodating both in the future, by not over constraining *resources*' lifetimes (see next section).
### Reporting errors in topology discovery
As querying the topology of a system can invoke a number of different system and third-party library, we have to consider what will happen when a call to one of these fails. Firstly we want to be able to report this failure so that it can be reported or handled in user code. Secondly as there will often be more than one source of topology discovery we have to avoid short-circuiting the discovery on an error and preventing potentially valid topology information being reported to users. For example if a system were to report both Hwloc and OpenCL execution resources and one of these failed we want the other to still be able to return it's resources.
A potential solution to this could be support partial errors in topology discovery, where querying the system for it's topology could be permitted to fail but still return a valid topology structure representing the topology that was discovered successfully. The way in which these errors are reported (i.e. exceptions or error values) would have to be decided, exceptions could be problematic as it could unwind the stack before capturing important topology information so perhaps an error value based approach would be preferable.
### Resource lifetime
Expand Down Expand Up @@ -267,26 +276,39 @@ Below *(Listing 2)* is an example of executing a parallel task over 8 threads us

## Execution resource topology

### Execution resources
### System topology

An `execution_resource` is a lightweight structure which acts as an identifier to particular piece of hardware within a system. It can be queried for whether it can allocate memory via `can_place_memory`, whether it can execute work via `can_place_agents`, and for its name via `name`. An `execution_resource` can also represent other `execution_resource`s. We call these *members of* that `execution_resource`, and can be queried via `resources`. Additionally the `execution_resource` which another is a *member of* can be queried via `member_of`. An `execution_resource` can also be queried for the concurrency it can provide, the total number of *threads of execution* supported by that *execution_resource*, and all resources it represents.
The **system topology** is comprised of a directed acyclic graph (DAG) of **execution resources**, representing all unique hardware and software components available within the system capable of executing work. The root node of the DAG is the **system execution resource** and represents the entire system. Each **execution resource** within the DAG may have any number of child **execution resources** representing a finer granularity of the parent **execution resource**. Every **execution resource** within the **system topology** is exposed via an `execution_resource` object.

> [*Note:* Note that an execution resource is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated such as off-chip memory. *--end note*]
The **system topology** can be discovered by calling `this_system::discover_topology`. This will discover all **execution resources** available within the system and construct the **system topology** DAG, describing a read-only snapshot at the point of the call, and then return an `execution_resource` object exposing the **system execution resource**.

> [*Note:* The intention is that the actual implementation details of a resource topology are described in an execution context when required. This allows the execution resource objects to be lightweight objects that serve as identifiers that are only referenced. *--end note*]
A call to `this_system::discover_topology` may invoke C++ library, system or third party library API calls required to discover certain **execution resources**. However, `this_system::discover_topology` must be thread safe and must initialize and finalize any OS or third-party state before returning.

### System topology
### Execution resources

An `execution_resource` is a lightweight structure which identifies a particular **execution resource** within a snapshot of the **system topology**. It can be queried for whether the associated **execution resource** can allocate memory via `can_place_memory`, whether the associated **execution resource** can execute work via `can_place_agents`, and for a name via `name`.

The system topology is made up of a number of system-level `execution_resource`s, which can be queried through `this_system::get_resources` which returns a `std::vector`. A run-time library may initialize the `execution_resource`s available within the system dynamically. However, `this_system::get_resources` must be thread safe and must initialize and finalize any third-party or OS state before returning.
An `execution_resource` object can be queried for a pointer to it's parent `execution_resource` via `member_of`, and can also be iterated over for it's child `execution_resource`s via `begin` and `end`.

Below *(Listing 3)* is an example of iterating over the system-level resources and printing out their capabilities.
An `execution_resource` object can also be queried for the amount concurrency it can provide, the total number of **threads of execution** supported by the associated **execution resource**.

> [*Note:* An **execution resource** is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated, such as off-chip memory. *--end note*]
Below *(Listing 3)* is an example of iterating over every **execution resource** within the **system topology** and printing out their capabilities.

```cpp
for (auto res : execution::this_system::get_resources()) {
std::cout << res.name() `\n`;
std::cout << res.can_place_memory() << `\n`;
std::cout << res.can_place_agents() << `\n`;
std::cout << res.concurrency() << `\n`;
void print_topology(const execution::execution_resource &resource, int indent = 0) {
for (int i = 0; i < indent; i++) { std::cout << " "; }
std::cout << resource.name() << ": " << resource.can_place_memory() << ", "
<< resource.can_place_agents() << ", " << resource.concurrency() << "\n";
for (const execution::execution_resource child : resource) {
print_topology(child, indent + 1);
}
}

int main(int argc, char * argv[]) {
auto systemResource = this_system::discover_topology();
print_topology(systemResource);
}
```
*Listing 3: Example of querying all the system level execution resources*
Expand All @@ -298,14 +320,13 @@ The `affinity_query` class template provides an abstraction for a relative affin
Below *(listing 4)* is an example of how to query the relative affinity between two `execution_resource`s.
```cpp
auto systemLevelResources = execution::this_system::get_resources();
auto memberResources = systemLevelResources.resources();
auto systemResource = this_system::discover_topology();
auto relativeLatency01 = execution::affinity_query<execution::affinity_operation::read,
execution::affinity_metric::latency>(memberResources[0], memberResources[1]);
execution::affinity_metric::latency>(systemResource[0], systemResource[1]);
auto relativeLatency02 = execution::affinity_query<execution::affinity_operation::read,
execution::affinity_metric::latency>(memberResources[0], memberResources[2]);
execution::affinity_metric::latency>(systemResource[0], systemResource[2]);
auto relativeLatency = relativeLatency01 > relativeLatency02;
```
Expand All @@ -320,27 +341,27 @@ The `execution_context` class provides an abstraction for managing a number of l
Below *(Listing 5)* is an example of how this extended interface could be used to construct an *execution context* from an *execution resource* which is retrieved from the *system’s resource topology*. Once an *execution context* is constructed it can then still be queried for its *execution resource*, and that *execution resource* can be further partitioned.

```cpp
auto &resources = execution::this_system::get_resources();
auto systemResource = std::this_system::discover_topology();

execution::execution_context execContext(resources[0]);
execution::execution_context execContext(systemResource[0]);

auto &systemLevelResource = execContext.resource();
auto &execResource = execContext.resource();

// resource[0] should be equal to execResource
// systemResource[0] should be equal to execResource

for (auto res : systemLevelResource.resources()) {
std::cout << res.name() << `\n`;
for (const execution::execution_resource &res : execResource) {
std::cout << res.name() << "\n";
}
```
*Listing 5: Example of constructing an execution context from an execution resource*
When creating an `execution_context` from a given `execution_resource`, the executors and allocators associated with it are bound to that `execution_resource`. For example, when creating an `execution_resource` from a CPU socket resource, all executors associated with the given socket will spawn execution agents with affinity to the socket partition of the system *(Listing 6)*.
```cpp
auto cList = std::execution::this_system::get_resources();
auto systemResource = std::this_system::discover_topology();
// FindASocketResource is a user-defined function that finds a
// resource that is a CPU socket in the given resource list
auto& socket = findASocketResource(cList);
auto& socket = findASocketResource(systemResource);
execution_contextC{socket} // Associated with the socket
auto executor = eC.executor(); // By transitivity, associated with the socket too
auto socketAllocator = eC.allocator(); // Retrieve an allocator to the closest memory node
Expand Down Expand Up @@ -378,18 +399,32 @@ The `execution_resource` which underlies the current thread of execution can be
class execution_resource {
public:

using value_type = execution_resource;
using pointer = execution_resource *;
using const_pointer = const execution_resource *;
using iterator = see-below;
using const_iterator = see-below;
using reference = execution_resource &;
using const_reference = const execution_resource &;
using size_type = std::size_t;

execution_resource() = delete;
execution_resource(const execution_resource &);
execution_resource(execution_resource &&);
execution_resource &operator=(const execution_resource &);
execution_resource &operator=(execution_resource &&);
~execution_resource();

size_t concurrency() const noexcept;
size_type size() const noexcept;

const_iterator begin() const noexcept;
const_iterator end() const noexcept;

std::vector<resource> resources() const noexcept;
const_reference operator[](std::size_t child) const noexcept;

const execution_resource member_of() const noexcept;
const_pointer member_of() const noexcept;

size_t concurrency() const noexcept;

std::string name() const noexcept;

Expand Down Expand Up @@ -455,7 +490,7 @@ The `execution_resource` which underlies the current thread of execution can be
/* This system */

namespace this_system {
std::vector<execution_resource> resources() noexcept;
const execution_resource discover_topology();
}

/* This thread */
Expand Down Expand Up @@ -494,9 +529,21 @@ The `execution_resource` class provides an abstraction over a system's hardware,

> [*Note:* Creating an `execution_resource` may require initializing the underlying software abstraction when the `execution_resource` is constructed, in order to discover other `execution_resource`s accessible through it. However, an `execution_resource` is nonowning. *--end note*]
### `execution_resource` member types

iterator

*Requires:* `iterator` to model `RandomAccessIterator` with the value type `execution_resource::value_type`.

const_iterator

*Requires:* `const_iterator` to model `RandomAccessIterator` with the value type `execution_resource::value_type`.

iterator_traits<>iterator_category

### `execution_resource` constructors

execution_resource();
execution_resource() = delete;

> [*Note:* An implementation of `execution_resource` is permitted to provide non-public constructors to allow other objects to construct them. *--end note*]
Expand All @@ -517,31 +564,43 @@ The `execution_resource` class provides an abstraction over a system's hardware,

*Returns:* The total concurrency available to this resource. More specifically, the number of *threads of execution* collectively available to this `execution_resource` and any resources which are *members of*, recursively.

std::vector<resource> resources() const noexcept;
size_type size() const noexcept;

*Returns:* All `execution_resource`s which are *members of* this resource.
*Returns:* The number of child `execution_resource`s.

const execution_resource &member_of() const noexcept;
const_iterator begin() const noexcept;

*Returns:* The `execution_resource` which this resource is a *member of*.
*Returns:* A const iterator to the beginning of the child `execution_resource`s.

const_iterator end() const noexcept;

*Returns:* A const iterator to the end of the child `execution_resource`s.

const_reference operator[](std::size_t child) const noexcept;

*Returns:* A const reference to the specified child `execution_resource`s.

const_pointer member_of() const noexcept;

*Returns:* The parent `execution_resource`.

std::string name() const noexcept;

*Returns:* An implementation defined string.

bool can_place_memory() const noexcept;

*Returns:* If this resource is capable of allocating memory with affinity, 'true'.
*Returns:* If the associated **execution resource* is capable of allocating memory with affinity, 'true'.

bool can_place_agent() const noexcept;

*Returns:* If this resource is capable of execute with affinity, 'true'.
*Returns:* If the associated **execution resource* is capable of execute with affinity, 'true'.

## Class `execution_context`

The `execution_context` class provides an abstraction for managing a number of lightweight execution agents executing work on an `execution_resource` and any `execution_resource`s encapsulated by it. The `execution_resource` which an `execution_context` encapsulates is referred to as the *contained resource*.

### `execution_context` types
### `execution_context` member types

using executor_type = see-below;

Expand Down Expand Up @@ -638,17 +697,19 @@ The `affinity_query` class template provides an abstraction for a relative affin
## Free functions

### `this_system::get_resources`
### `this_system::discover_topology`

The free function `this_system::discover_topology` is provided for discovering the **system topology**.

The free function `this_system::get_resources` is provided for retrieving the `execution_resource`s which encapsulate the hardware platforms available within the system. We refer to these resources as the *system level resources*.
const execution_resource discover_topology();

std::vector<execution_resource> resources() noexcept;
*Returns:* An `execution_resource` object exposing the **system execution resource**.

*Returns:* An `std::vector` containing all *system level resources*.
*Requires:* If `this_system::discover_topology().size() > 0`, `this_system::discover_topology()[0]` be the `execution_resource` use by `std::thread`. Calls to `this_system::discover_topology()` may not introduce a data race with any other call to `this_system::discover_topology()`.

*Requires:* If `this_system::get_resources().size() > 0`, `this_system::get_resources()[0]` be the `execution_resource` use by `std::thread`. The value returned by `this_system::get_resources()` be the same at any point after the invocation of `main`.
*Effects:* Discovers all **execution resources** available within the system and constructs the **system topology** DAG, describing a read-only snapshot at the point of the call.

> [*Note:* Returning a `std::vector` allows users to potentially manipulate the container of `execution_resource`s after it is returned. We may want to replace this at a later date with an alternative type which is more restrictive, such as a range or span. *--end note*]
*Throws:* Any exception thrown as a result of **system topology** discovery.

### `this_thread::get_resource`

Expand Down

0 comments on commit 1fc49ea

Please sign in to comment.