An RPMI transport is an abstraction over a physical medium used to send and receive messages between the application processors (APs) and the platform microcontroller (PuC). It provides bi-directional communication between a RISC-V privilege-level of application processors and a platform microcontroller. The application processors can have multiple RPMI transport instances with a platform microcontroller. Also, a platform can also have multiple microcontrollers each with its own RPMI transport instance as shown in the [fig_intro_trans_topology] below.
An RPMI transport instance consists of two logical bi-directional channels for message delivery as shown in the Bi-directional Communication below. Each channel is capable of transferring messages in request-response pairs. A channel which transfers a request message from the application processors (APs) to the platform microcontroller (PuC) and response/acknowledgement back in opposite direction is called an A2P channel. Similarly, the channel for request messages from the platform microcontroller (PuC) to the application processors (APs) is called a P2A channel. The P2A channel also transfers notification messages to the application processors.
An RPMI transport instance must implement the A2P channel but the P2A channel is optional. Platforms which do not require requests and notification messages from the platform microcontroller can avoid implementing the P2A channel.
The current RPMI specification only defines a shared memory based transport but other transport types can be added in the future.
An RPMI transport may also provide optional doorbell interrupts for application processors and/or the platform microcontroller to signal the arrival of new messages. This doorbell interrupt can be either a message-signaled interrupt (MSI) or a wired interrupt. The RPMI implementations may ignore the doorbell mechanism of RPMI transport and always use a polling mechanism to check the arrival of new messages.
The A2P doorbell is a signal for new messages from the application processors (APs) to the platform microcontroller (PuC).
The platform should support A2P doorbell interrupt triggering from application processors through either a write operation or a read-modify-write sequence on a memory-mapped register, which can be easily discovered by the application processors using hardware description mechanisms such as device tree or ACPI.
The P2A doorbell is a signal for new messages from the platform microcontroller (PuC) to the application processors (APs).
If the P2A doorbell is a wired interrupt then the platform must provide a way to the platform microcontroller to trigger the interrupt and application processors must discover it using standard hardware description mechanisms such as device tree or ACPI.
If the P2A doorbell is a MSI then the application processors must configure
the MSI on the platform microcontroller side using RPMI messages defined by
the BASE
service group.
Fast-channels are special shared memory-based channels used in scenarios requiring lower latency and faster processing of requests from application processors to the platform microcontroller.
The layout and request format of fast-channels are service group specific and only a few service groups may support fast-channels. A service group that supports fast-channels:
-
May only enable some services to be used over fast-channels
-
Must provide physical address and other attributes (such as optional fast-channel doorbell) of the fast-channels via a services defined by the service group
Note
|
To avoid the caching side-effects, the platform can configure the fast-channel shared memory as non-cacheable or IO memory for both the application processors and the platform microcontroller. |
The RPMI shared memory transport defines a mechanism to exchange messages via shared memory which can be on-chip SRAM or a reserved portion of DRAM or some device memory. The RPMI shared memory transport does not specify where the shared memory resides in a platform, but it must be accessible from both the application processors and the platform microcontroller.
Note
|
To avoid the caching side-effects, the platform can configure the shared memory as non-cacheable or IO memory for both the application processor and the platform microcontroller. |
All data sent or received through the RPMI shared memory transport must follow little-endian byte-order.
The Shared Memory Transport Architecture below shows the high-level architecture of the RPMI shared memory transport. The layout and attributes of a RPMI shared memory transport may be static for the platform microcontroller but must be discoverable by the application processors through hardware description mechanisms such as device tree or ACPI.
The RPMI shared memory transport consists of four unidirectional queues. The type of messages and the direction of message delivery is fixed for each RPMI shared memory transport queue. The Shared Memory Transport Queues below provides a more detailed description of all RPMI shared memory transport queues.
Name | Message Type | Description |
---|---|---|
A2P REQ |
REQUEST |
The request message queue from the application processor to the platform microcontroller. |
P2A ACK |
ACKNOWLEDGEMENT |
The acknowledgement message queue from the platform microcontroller to the application processor. |
P2A REQ |
REQUEST & NOTIFICATION |
The request message queue from the platform microcontroller to the application processor. This queue is also used for sending the notification messages. |
A2P ACK |
ACKNOWLEDGEMENT |
The acknowledgement message queue from the application processor to the platform microcontroller. |
The A2P REQ queue is paired with P2A ACK queue to form the A2P channel of the RPMI shared memory transport. Similarly, the P2A REQ queue is paired with the A2P ACK queue to form the P2A channel of the RPMI shared memory transport. The Shared Memory Transport Message Flow below shows the high-level flow of messages in a RPMI shared memory transport.
An RPMI shared memory queue is divided into M
contiguous slots of equal size
which are used to form a circular queue. The size of each slot (or slot size)
must be a power-of-2
and must be at least 64 bytes
. The slot size is same
across all RPMI shared memory queues and the physical address of each slot
must be aligned at slot size boundary.
Note
|
The slot size should match with the maximum cache line size used in a
platform. The requirement of power-of-2 slot size with minimum value of
64 bytes is because usual CPU cache line size is 64 bytes or some
power-of-2 value.
|
The slots of the RPMI shared memory queue are assigned sequentially increasing
indices starting with 0
. The slot at index 0
is referred to as the
head slot
and the slot at index 1
is referred to as the tail slot
. The
remaining (M - 2)
slots of the RPMI shared memory queue are message slots.
The first 4 bytes
of the head slot is used as the head
of the circular
queue which contains a slot index - 2
value pointing to the message slot from
where the next message can be dequeued. The first 4 bytes
of the tail slot is
used as the tail
of the circular queue which contains a slot index - 2
value
pointing to the message slot from where the next message can be enqueued. The
pictorial view of the RPMI shared memory queue internals is shown in the
Shared Memory Queue Internals below.
Note
|
The requirement of keeping head and tail in separate slots is
to prevent both head and tail using the same cache line so that cache
maintenance can be done separately for both head and tail .
|
A message consumer dequeues pending message from the message slot pointed
by the head
of the RPMI shared memory queue whereas a message producer
enqueues new message at the message slot pointed by the tail
of the RPMI
shared memory queue. If there are no messages in the RPMI shared memory queue
then message consumer must wait for messages to be available. If all message
slots in the RPMI shared memory queue are occupied then message producer must
wait for messages to be consumed. The ownership of head
and tail
is mutually
exclusive where only the message consumer should update the head
and only the
message producer should update tail
of the RPMI shared memory queue.
Note
|
For example, only application processors should enqueue new messages
and update head of the A2P REQ queue whereas only platform microcontroller
should dequeue messages and update tail of the A2P REQ queue.
|
The RPMI shared memory transport divides the underlying shared memory region into two parts where one part belongs to the A2P channel and other belongs to the P2A channel. The shared memory region sizes of the A2P and P2A channel can be different. For each channel (A2P or P2A), the corresponding REQ and ACK queues must be of the same size hence equal number of slots (or queue capacity). The size of each RPMI shared shared queue must be a multiple of the slot size.
Note
|
A platform should provide sufficient shared memory for all RPMI shared memory queues so that the number of slots (queue capacity) does not become a bottleneck in message communication. It is recommended that the number of slots in queues belonging to A2P channel should be proportional to the number of application processors accessing the A2P channel. |
The RPMI shared memory queues can be placed anywhere in the underlying shared memory region but there must be no overlap among the queues. The Recommended Placement of Queues in Shared Memory below shows a recommended way of placing RPMI queues in shared memory.
Note
|
A platform may allocate separate non-contiguous shared memory regions
for queues which may require multiple PMA entries to define the memory attributes.
To avoid this the platform can allocate contiguous regions for all four queues.
For example, the platform may allocate 4096 bytes of shared memory for all
four queues and memory attributes can be covered with single PMA entry.
|
The slot size of the RPMI shared memory queues may be fixed for the platform microcontroller but the application processors must discover it through hardware description mechanisms such as device tree or ACPI. Similarly, the physical base address and size of each RPMI shared memory queue may be fixed for the platform microcontroller but the application processors must discover it through hardware description mechanisms such as device tree or ACPI.
The total number of slots in each RPMI shared memory queue can easily be calculated by dividing the queue size with slot size.
Note
|
|