Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc edits #29

Merged
merged 17 commits into from
May 23, 2024
99 changes: 49 additions & 50 deletions qos_intro.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,79 +3,79 @@

Quality of Service (QoS) is defined as the minimal end-to-end performance that
is guaranteed in advance by a service level agreement (SLA) to a workload. A
workload may be a single application, a group of applications, a virtual machine,
a group of virtual machines, or a combination of those. The performance may
be measured in the form of metrics such as instructions per cycle (IPC), latency
workload can be a single application, a group of applications, a virtual machine,
a group of virtual machines, or a combination of those. The performance is measured
in the form of metrics such as instructions per cycle (IPC), latency
of servicing work, etc.

Various factors such as the available cache capacity, memory bandwidth,
interconnect bandwidth, CPU cycles, system memory, etc. affect the performance
in a computing system that runs multiple workloads concurrently. Furthermore,
when there is arbitration for shared resources, the prioritization of the
workloads' requests against other competing requests may also affect the
performance of the workload. Such interference due to resource sharing may lead
interconnect bandwidth, CPU cycles, system memory, and so on affect the performance
of a computing system that runs multiple workloads concurrently. Furthermore,
during arbitration for shared resources, the prioritization of the
workloads' requests against other competing requests might also affect the
performance of the workload. Such interference due to resource sharing can lead
to unpredictable workload performance cite:[PTCAMP].

When multiple workloads are running concurrently on modern processors with large
core counts, multiple cache hierarchies, and multiple memory controllers, the
performance of a workload becomes less deterministic or even non-deterministic.
This is because the performance depends on the behavior of all the other
workloads in the machine that contend for the shared resources, leading to
performance of a workload can become less deterministic or even non-deterministic.
This situation occurs because the worload performance depends on the behavior of the
other workloads in the machine that contend for the shared resources, whcih can lead to
kersten1 marked this conversation as resolved.
Show resolved Hide resolved
interference. In many deployment scenarios, such as public cloud servers, the
workload owner may not be in control of the type and placement of other
workload owner might not be in control of the type and placement of other
workloads in the platform.

System software can control some of these resources available to the workload,
such as the number of hardware threads made available for execution, the amount
of system memory allocated to the workload, the number of CPU cycles provided
for execution, etc.
of system memory allocated to the workload, and the number of CPU cycles provided
for execution.

System software needs additional tools to control interference to a workload
and thereby reduce the variability in performance experienced by one workload
and thereby reduce the variability in the performance that is experienced by one workload
due to other workloads' cache capacity usage, memory bandwidth usage,
interconnect bandwidth usage, etc. through a resource allocation capability.
interconnect bandwidth usage, and so on through a resource allocation capability.
The resource allocation capability enables system software to reserve capacity
and/or bandwidth to meet the performance goals of the workload. Such controls
enable improving the utilization of the system by collocating workloads while
can improve the utilization of the system by co-locating workloads while
minimizing the interference caused by one workload to another cite:[HERACLES].

Effective use of the resource allocation capability requires hardware to provide
Effective use of the resource allocation capability requires the hardware to provide
a resource monitoring capability by which the resource requirements of a
workload needed to meet a certain performance goal can be characterized. A
typical use model involves profiling the resource usage of the workload using
typical use model involves profiling the resource usage of the workload by using
the resource monitoring capability and to establish resource allocations for the
workload using the resource allocation capability.
workload by using the resource allocation capability.

The allocation may be in the form of capacity or bandwidth, depending on the type
The allocation might be in the form of capacity or bandwidth, depending on the type
of resource. For caches, TLBs, and directories, the resource allocation is in
the form of storage capacity. For interconnects and memory controllers, the
resource allocation is in the form of bandwidth.

Workloads generate different types of accesses to shared resources. For example,
some of the requests may be for accessing instructions and others may be to
access data operated on by the workload. Certain data accesses may not have
temporal locality whereas others may have a high probability of reuse. In some
cases it is desirable to provide differentiated treatment to each type of access
by providing unique resource allocation to each access type.
some of the requests might be for accessing instructions and other requests might be
to access data that is operated on by the workload. Certain data accesses might not have
temporal locality, whereas others can have a high probability of reuse. In some
cases, you can provide differentiated treatment to each type of access by providing
unique resource allocation to each access type.

RISC-V Capacity and Bandwidth Controller QoS Register Interface (CBQRI)
specification specifies:
specification specifies the following identifiers and interfaces:

. QoS identifiers to identify workloads that originate requests to the shared
resources. These QoS identifiers include an identifier for resource allocation
configurations and an identifier for the monitoring counters used to monitor
resource usage. These identifiers accompany each request made by the workload
to the shared resource. <<QOS_ID>> specifies the mechanism to associate the
identifiers with workloads.
resource usage. These identifiers accompany each request that is made by the
workload to the shared resource. <<QOS_ID>> specifies the mechanism to associate
the identifiers with workloads.
. Access-type identifiers to accompany request to access a shared resource to
allow differentiated treatment of each access-type (e.g., code vs. data,
etc.). The access-types are defined in <<QOS_ID>>.
. Register interface for capacity allocation in controllers such as shared
caches, directories, etc. The capacity allocation register interface is
allow differentiated treatment of each access-type (for example, code vs. data,).
The access-types are defined in <<QOS_ID>>.
. Register interface for capacity allocation in controllers, such as shared
caches, directories, and so on. The capacity allocation register interface is
specified in <<CC_QOS>>.
. Register interface for capacity usage monitoring. The capacity usage
monitoring register interface is specified in <<CC_QOS>>.
. Register interface for bandwidth allocation in controllers such as
. Register interface for bandwidth allocation in controllers, such as
interconnect and memory controllers. The bandwidth allocation register
interface is specified in <<BC_QOS>>.
. Register interface for bandwidth usage monitoring. The bandwidth
Expand All @@ -84,39 +84,38 @@ specification specifies:
The capacity and bandwidth controller register interfaces for resource
allocation and usage monitoring are defined as memory-mapped registers. Each
controller that supports CBQRI provides a set of registers that are
memory-mapped starting at an 8-byte aligned physical address. The memory-mapped
registers may be accessed using naturally aligned 4-byte or 8-byte memory
memory-mapped, starting at an 8-byte aligned physical address. The memory-mapped
registers can be accessed by using naturally aligned 4-byte or 8-byte memory
accesses. The controller behavior for register accesses where the address is not
aligned to the size of the access, or if the access spans multiple registers, or
if the size of the access is not 4 bytes or 8 bytes, is `UNSPECIFIED`. A 4-byte
access to a register must be single-copy atomic. Whether an 8-byte access to a
CBQRI register is single-copy atomic is `UNSPECIFIED`, and such an access may
CBQRI register is single-copy atomic is considered `UNSPECIFIED`. This type of access can
appear, internally to the CBQRI implementation, as if two separate 4-byte
accesses were performed.
accesses are performed.

[NOTE]
====
The CBQRI registers are defined in such a way that software can perform two
individual 4 byte accesses, or hardware can perform two independent 4 byte
transactions resulting from an 8 byte access, to the high and low halves of the
register as long as the register semantics, with regards to side-effects, are
respected between the two software accesses, or two hardware transactions,
The CBQRI registers are defined so that the software can perform two
individual 4 byte accesses or the hardware can perform two independent 4-byte
transactions to the high and low halves of the register, as long as the register
semantics are respected between the two software accesses or two hardware transactions,
ved-rivos marked this conversation as resolved.
Show resolved Hide resolved
respectively.
====

The controller registers have little-endian byte order (even if all harts are
The controller registers use little-endian byte order (even if all harts are
big-endian-only).

[NOTE]
====
Big-endian-configured harts that make use of the register interface may
Big-endian-configured harts that make use of the register interface might
implement the `REV8` byte-reversal instruction defined by the Zbb extension. If
`REV8` is not implemented, then endianness conversion may be implemented using a
`REV8` is not implemented, then endianness conversion might be implemented by using a
sequence of instructions.
====

A controller may support a subset of capabilities defined by CBQRI. When a
capability is not supported the registers and/or fields used to configure and/or
control such capabilities are hardwired to 0. Each controller supports a
A controller can support a subset of capabilities tht are defined by CBQRI. When a
ved-rivos marked this conversation as resolved.
Show resolved Hide resolved
capability is not supported, the registers and/or fields used to configure and/or
control such capabilities are hardwired to `0`. Each controller supports a
capabilities register to enumerate the supported capabilities.

Loading