Skip to content

Latest commit

 

History

History
257 lines (210 loc) · 11.7 KB

transport.adoc

File metadata and controls

257 lines (210 loc) · 11.7 KB

Transport

An RPMI transport is an abstraction over a physical medium used to send and receive messages between the application processors (APs) and the platform microcontroller (PuC). It provides bi-directional communication between a RISC-V privilege-level of application processors and a platform microcontroller. The application processors can have multiple RPMI transport instances with a platform microcontroller. Also, a platform can also have multiple microcontrollers each with its own RPMI transport instance as shown in the [fig_intro_trans_topology] below.

An RPMI transport instance consists of two logical bi-directional channels for message delivery as shown in the Bi-directional Communication below. Each channel is capable of transferring messages in request-response pairs. A channel which transfers a request message from the application processors (APs) to the platform microcontroller (PuC) and response/acknowledgement back in opposite direction is called an A2P channel. Similarly, the channel for request messages from the platform microcontroller (PuC) to the application processors (APs) is called a P2A channel. The P2A channel also transfers notification messages to the application processors.

An RPMI transport instance must implement the A2P channel but the P2A channel is optional. Platforms which do not require requests and notification messages from the platform microcontroller can avoid implementing the P2A channel.

The current RPMI specification only defines a shared memory based transport but other transport types can be added in the future.

400
Figure 1. Bi-directional Communication

Doorbell Interrupt

An RPMI transport may also provide optional doorbell interrupts for application processors and/or the platform microcontroller to signal the arrival of new messages. This doorbell interrupt can be either a message-signaled interrupt (MSI) or a wired interrupt. The RPMI implementations may ignore the doorbell mechanism of RPMI transport and always use a polling mechanism to check the arrival of new messages.

A2P Doorbell

The A2P doorbell is a signal for new messages from the application processors (APs) to the platform microcontroller (PuC).

The platform should support A2P doorbell interrupt triggering from application processors through either a write operation or a read-modify-write sequence on a memory-mapped register, which can be easily discovered by the application processors using hardware description mechanisms such as device tree or ACPI.

P2A Doorbell

The P2A doorbell is a signal for new messages from the platform microcontroller (PuC) to the application processors (APs).

If the P2A doorbell is a wired interrupt then the platform must provide a way to the platform microcontroller to trigger the interrupt and application processors must discover it using standard hardware description mechanisms such as device tree or ACPI.

If the P2A doorbell is a MSI then the application processors must configure the MSI on the platform microcontroller side using RPMI messages defined by the BASE service group.

Fast-channels

Fast-channels are special shared memory-based channels used in scenarios requiring lower latency and faster processing of requests from application processors to the platform microcontroller.

The layout and request format of fast-channels are service group specific and only a few service groups may support fast-channels. A service group that supports fast-channels:

  • May only enable some services to be used over fast-channels

  • Must provide physical address and other attributes (such as optional fast-channel doorbell) of the fast-channels via a services defined by the service group

Note
To avoid the caching side-effects, the platform can configure the fast-channel shared memory as non-cacheable or IO memory for both the application processors and the platform microcontroller.

Shared Memory Transport

The RPMI shared memory transport defines a mechanism to exchange messages via shared memory which can be on-chip SRAM or a reserved portion of DRAM or some device memory. The RPMI shared memory transport does not specify where the shared memory resides in a platform, but it must be accessible from both the application processors and the platform microcontroller.

Note
To avoid the caching side-effects, the platform can configure the shared memory as non-cacheable or IO memory for both the application processor and the platform microcontroller.

All data sent or received through the RPMI shared memory transport must follow little-endian byte-order.

The Shared Memory Transport Architecture below shows the high-level architecture of the RPMI shared memory transport. The layout and attributes of a RPMI shared memory transport may be static for the platform microcontroller but must be discoverable by the application processors through hardware description mechanisms such as device tree or ACPI.

highlevel arch queues
Figure 2. Shared Memory Transport Architecture

Queue Types

The RPMI shared memory transport consists of four unidirectional queues. The type of messages and the direction of message delivery is fixed for each RPMI shared memory transport queue. The Shared Memory Transport Queues below provides a more detailed description of all RPMI shared memory transport queues.

Table 1. Shared Memory Transport Queues
Name Message Type Description

A2P REQ

REQUEST

The request message queue from the application processor to the platform microcontroller.

P2A ACK

ACKNOWLEDGEMENT

The acknowledgement message queue from the platform microcontroller to the application processor.

P2A REQ

REQUEST & NOTIFICATION

The request message queue from the platform microcontroller to the application processor. This queue is also used for sending the notification messages.

A2P ACK

ACKNOWLEDGEMENT

The acknowledgement message queue from the application processor to the platform microcontroller.

The A2P REQ queue is paired with P2A ACK queue to form the A2P channel of the RPMI shared memory transport. Similarly, the P2A REQ queue is paired with the A2P ACK queue to form the P2A channel of the RPMI shared memory transport. The Shared Memory Transport Message Flow below shows the high-level flow of messages in a RPMI shared memory transport.

400
Figure 3. Shared Memory Transport Message Flow

Queue Layout

An RPMI shared memory queue is divided into M contiguous slots of equal size which are used to form a circular queue. The size of each slot (or slot size) must be a power-of-2 and must be at least 64 bytes. The slot size is same across all RPMI shared memory queues and the physical address of each slot must be aligned at slot size boundary.

Note
The slot size should match with the maximum cache line size used in a platform. The requirement of power-of-2 slot size with minimum value of 64 bytes is because usual CPU cache line size is 64 bytes or some power-of-2 value.

The slots of the RPMI shared memory queue are assigned sequentially increasing indices starting with 0. The slot at index 0 is referred to as the head slot and the slot at index 1 is referred to as the tail slot. The remaining (M - 2) slots of the RPMI shared memory queue are message slots. The first 4 bytes of the head slot is used as the head of the circular queue which contains a slot index - 2 value pointing to the message slot from where the next message can be dequeued. The first 4 bytes of the tail slot is used as the tail of the circular queue which contains a slot index - 2 value pointing to the message slot from where the next message can be enqueued. The pictorial view of the RPMI shared memory queue internals is shown in the Shared Memory Queue Internals below.

Note
The requirement of keeping head and tail in separate slots is to prevent both head and tail using the same cache line so that cache maintenance can be done separately for both head and tail.
500
Figure 4. Shared Memory Queue Internals

A message consumer dequeues pending message from the message slot pointed by the head of the RPMI shared memory queue whereas a message producer enqueues new message at the message slot pointed by the tail of the RPMI shared memory queue. If there are no messages in the RPMI shared memory queue then message consumer must wait for messages to be available. If all message slots in the RPMI shared memory queue are occupied then message producer must wait for messages to be consumed. The ownership of head and tail is mutually exclusive where only the message consumer should update the head and only the message producer should update tail of the RPMI shared memory queue.

Note
For example, only application processors should enqueue new messages and update head of the A2P REQ queue whereas only platform microcontroller should dequeue messages and update tail of the A2P REQ queue.

Queue Placement

The RPMI shared memory transport divides the underlying shared memory region into two parts where one part belongs to the A2P channel and other belongs to the P2A channel. The shared memory region sizes of the A2P and P2A channel can be different. For each channel (A2P or P2A), the corresponding REQ and ACK queues must be of the same size hence equal number of slots (or queue capacity). The size of each RPMI shared shared queue must be a multiple of the slot size.

Note
A platform should provide sufficient shared memory for all RPMI shared memory queues so that the number of slots (queue capacity) does not become a bottleneck in message communication. It is recommended that the number of slots in queues belonging to A2P channel should be proportional to the number of application processors accessing the A2P channel.

The RPMI shared memory queues can be placed anywhere in the underlying shared memory region but there must be no overlap among the queues. The Recommended Placement of Queues in Shared Memory below shows a recommended way of placing RPMI queues in shared memory.

Note
A platform may allocate separate non-contiguous shared memory regions for queues which may require multiple PMA entries to define the memory attributes. To avoid this the platform can allocate contiguous regions for all four queues. For example, the platform may allocate 4096 bytes of shared memory for all four queues and memory attributes can be covered with single PMA entry.
600
Figure 5. Recommended Placement of Queues in Shared Memory

Queue Discovery

The slot size of the RPMI shared memory queues may be fixed for the platform microcontroller but the application processors must discover it through hardware description mechanisms such as device tree or ACPI. Similarly, the physical base address and size of each RPMI shared memory queue may be fixed for the platform microcontroller but the application processors must discover it through hardware description mechanisms such as device tree or ACPI.

The total number of slots in each RPMI shared memory queue can easily be calculated by dividing the queue size with slot size.

Note
Example calculation

X bytes : Queue shared memory size.
M = (X / slot-size) : Total slot count in a queue
(M-2) : Message slot count (2 slots less for `HEAD` and `TAIL`)