Skip to content

Commit

Permalink
Merge pull request #243 from ved-rivos/errata_updates
Browse files Browse the repository at this point in the history
Clarification updates to IOMMU v1.0.0
  • Loading branch information
ved-rivos authored Sep 11, 2024
2 parents 8be67ba + 0076b5f commit 5446859
Show file tree
Hide file tree
Showing 18 changed files with 560 additions and 229 deletions.
6 changes: 1 addition & 5 deletions src/images/ddt-base.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions src/images/ddt-ext.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions src/images/guest-OS.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions src/images/hypervisor.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions src/images/msi-imsic.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions src/images/non-virt-OS.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions src/images/pdt.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 48 additions & 0 deletions src/iommu.bib
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,51 @@ @electronic{AIA
title = {RISC-V Advanced Interrupt Architecture},
url = {https://github.com/riscv/riscv-aia}
}
@electronic{CFI,
title = {RISC-V Shadow Stacks and Landing Pads},
url = {https://github.com/riscv/riscv-cfi}
}
@electronic{PR243,
title = {Clarification updates to IOMMU v1.0.0},
url = {https://github.com/riscv-non-isa/riscv-iommu/pull/243/commits}
}
@electronic{CBQRI,
title = {RISC-V Capacity and Bandwidth QoS Register Interface},
url = {https://github.com/riscv-non-isa/riscv-cbqri}
}
@article{PTCAMP,
author = {Du Bois, Kristof and Eyerman, Stijn and Eeckhout, Lieven},
title = {Per-Thread Cycle Accounting in Multicore Processors},
year = {2013},
issue_date = {January 2013},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {9},
number = {4},
issn = {1544-3566},
url = {https://doi.org/10.1145/2400682.2400688},
doi = {10.1145/2400682.2400688},
journal = {ACM Trans. Archit. Code Optim.},
month = {jan},
articleno = {29},
numpages = {22},
}
@inproceedings{HERACLES,
author = {Lo, David and Cheng, Liqun and Govindaraju, Rama and Ranganathan, Parthasarathy and Kozyrakis, Christos},
title = {Heracles: Improving Resource Efficiency at Scale},
year = {2015},
isbn = {9781450334020},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2749469.2749475},
doi = {10.1145/2749469.2749475},
booktitle = {Proceedings of the 42nd Annual International Symposium on Computer Architecture},
pages = {450–462},
numpages = {13},
location = {Portland, Oregon},
series = {ISCA '15}
}
@electronic{SSQOSID,
title = {RISC-V Quality-of-Service (QoS) Identifiers},
url = {https://github.com/riscv/riscv-ssqosid}
}
120 changes: 82 additions & 38 deletions src/iommu_data_structures.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ is not prohibited by this specification.
The DDT is a 1, 2, or 3-level radix-tree indexed using the device directory
index (DDI) bits of the `device_id` to locate a `DC`.

<<<

The following diagrams illustrate the DDT radix-tree. The PPN of the root
device-directory-table is held in a memory-mapped register called the
device-directory-table pointer (`ddtp`).
Expand All @@ -150,7 +152,7 @@ next device-directory-table.
A valid leaf device-directory-table entry holds the device-context (`DC`).

.Three, two and single-level device directory with extended format `DC`
image::ddt-ext.svg[width=800,height=400]
image::ddt-ext.svg[width=800,height=400, align="center"]
//["ditaa",shadows=false, separation=false, font=courier, fontsize: 16]
//....
// +-------+-------+-------+ +-------+-------+ +-------+
Expand All @@ -174,7 +176,7 @@ image::ddt-ext.svg[width=800,height=400]
//....
.Three, two and single-level device directory with base format `DC`
image::ddt-base.svg[width=800,height=400]
image::ddt-base.svg[width=800,height=400, align="center"]
//["ditaa",shadows=false, separation=false, font=courier, fontsize: 16]
//....
// +-------+-------+-------+ +-------+-------+ +-------+
Expand Down Expand Up @@ -213,6 +215,8 @@ A valid (`V==1`) non-leaf DDT entry provides the PPN of the next level DDT.
], config:{lanes: 2, hspace:1024, fontsize: 16}}
....
<<<
==== Leaf DDT entry
The leaf DDT page is indexed by `DDI[0]` and holds the device-context (`DC`).
Expand Down Expand Up @@ -312,17 +316,22 @@ Such addresses also cannot be routed within the device when peer-to-peer
transactions within the device (e.g. between functions of a device) are
supported.
Use of `T2GPA` set to 1 may not be compatible with devices that implement caches
tagged by the translated address returned in response to a PCIe ATS Translation
Request.
====
<<<
[NOTE]
====
Hypervisors that configure `T2GPA` to 1 must ensure through protocol-specific
means that translated accesses are routed through the host such that the IOMMU
may translate the GPA and then route the transaction based on PA to memory or
to a peer device. For PCIe, for example, the Access Control Service (ACS) must
be configured to always redirect peer-to-peer (P2P) requests upstream to the
host.
Use of `T2GPA` set to 1 may not be compatible with devices that implement caches
tagged by the translated address returned in response to a PCIe ATS Translation
Request.
As an alternative to setting `T2GPA` to 1, the hypervisor may establish a trust
relationship with the device if authentication protocols are supported by the
device. For PCIe, for example, the PCIe component measurement and authentication
Expand Down Expand Up @@ -406,8 +415,7 @@ When `SXL` is 1, the following rules apply:
* If the first-stage is not Bare, then a page fault corresponding to the original
access type occurs if the `IOVA` has bits beyond bit 31 set to 1.
* If the second-stage is not Bare, then a guest page fault corresponding to the
original access type occurs if the incoming GPA has bits beyond bit 33 set to
1.
original access type occurs if the incoming GPA has bits beyond bit 33 set to 1.
===== IO hypervisor guest address translation and protection (`iohgatp`)
Expand Down Expand Up @@ -437,11 +445,11 @@ encodings are as follows:
[[IOHGATP_MODE_ENC]]
.Encodings of `iohgatp.MODE` field
[width=75%]
[%header, cols="3,3,20"]
[%autowidth,float="center",align="center"]
[%header, cols="^3,^3,20"]
|===
3+^| `fctl.GXL=0`
|Value | Name | Description
^|Value ^| Name ^| Description
| 0 | `Bare` | No translation or protection.
| 1-7 | -- | Reserved for standard use.
| 8 | `Sv39x4` | Page-based 41-bit virtual addressing (2-bit extension
Expand Down Expand Up @@ -476,6 +484,7 @@ the PTEs from the first page table or the second page table. These are the only
expected behaviors.
====
[[DC_TA]]
===== Translation attributes (`ta`)
.Translation attributes (`ta`) field
Expand All @@ -484,7 +493,9 @@ expected behaviors.
{reg: [
{bits: 12, name: 'reserved'},
{bits: 20, name: 'PSCID'},
{bits: 32, name: 'reserved'},
{bits: 8, name: 'reserved'},
{bits: 12, name: 'RCID'},
{bits: 12, name: 'MCID'},
], config:{lanes: 2, hspace: 1024, fontsize: 16}}
....
Expand All @@ -494,6 +505,21 @@ fences on a per-address-space basis. The `PSCID` field in `ta` is used as the
address-space ID if `DC.tc.PDTV` is 0 and the `iosatp.MODE` field is not `Bare`.
When `DC.tc.PDTV` is 1, the `PSCID` field in `ta` is ignored.
The `RCID` and `MCID` fields are added by the QoS ID extension. If
`capabilities.QOSID` is 0, these bits are reserved and must be set to 0.
IOMMU-initiated requests for accessing the following data structures use the
value configured in the `RCID` and `MCID` fields of `DC.ta`.
* Process directory table (`PDT`)
* Second-stage page table
* First-stage page table
* MSI page table
* Memory-resident interrupt file (`MRIF`)
The `RCID` and `MCID` configured in `DC.ta` are provided to the IO bridge on
successful address translations. The IO bridge should associate these QoS IDs
with device-initiated requests.
===== First-Stage context (`fsc`)
If `DC.tc.PDTV` is 0, the `DC.fsc` field holds the `iosatp` that provides
the controls for first-stage address translation and protection.
Expand Down Expand Up @@ -524,11 +550,11 @@ address.
[[IOSATP_MODE_ENC]]
.Encodings of `iosatp.MODE` field
[width=75%]
[%header, cols="3,3,20"]
[%autowidth,float="center",align="center"]
[%header, cols="^3,^3,20"]
|===
3+^| `DC.tc.SXL=0`
|Value | Name | Description
^|Value ^| Name ^| Description
| 0 | `Bare` | No translation or protection.
| 1-7 | -- | Reserved for standard use.
| 8 | `Sv39` | Page-based 39-bit virtual addressing.
Expand Down Expand Up @@ -571,11 +597,11 @@ directly edit the PDT to associate a virtual-address space identified by a
first-stage page table with a `process_id`.
[[PDTP_MODE_ENC]]
.Encoding of `pdtp.MODE` field
[width=75%]
[%header, cols="3,3,20"]
.Encodings of `pdtp.MODE` field
[%autowidth,float="center",align="center"]
[%header, cols="^3,^3,20"]
|===
|Value | Name | Description
^|Value ^| Name ^| Description
| 0 | `Bare` | No first-stage address translation or protection.
| 1 | `PD8` | 8-bit process ID enabled. The directory has 1 levels with
256 entries.The bits 19:8 of `process_id` must be 0.
Expand Down Expand Up @@ -607,11 +633,13 @@ defined by the Advanced Interrupt Architecture specification.
The `msiptp.MODE` field is used to select the MSI address translation scheme.
.Encoding of `msiptp.MODE` field
[width=75%]
[%header, cols="3,3,20"]
<<<
.Encodings of `msiptp.MODE` field
[%autowidth,float="center",align="center"]
[%header, cols="^3,^3,20"]
|===
|Value | Name | Description
^|Value ^| Name ^| Description
| 0 | `Off` | Recognition of accesses to
a virtual interrupt file using MSI address mask and
pattern is not performed.
Expand Down Expand Up @@ -706,6 +734,8 @@ misconfigured" (cause = 259).
. `DC.tc.SBE` value is not a legal value. If `fctl.BE` is writable
then `DC.tc.SBE` may be 0 or 1. If `fctl.BE` is not writable then
`DC.tc.SBE` must be the same as `fctl.BE`.
. `capabilities.QOSID` is 1 and `DC.ta.RCID` or `DC.ta.MCID` values
are wider than that supported by the IOMMU.
[NOTE]
====
Expand Down Expand Up @@ -882,6 +912,8 @@ misconfigured" (cause = 267).
. `DC.tc.SXL` is 1 and `PC.fsc.MODE` is not one of the supported modes
.. `capabilities.Sv32` is 0 and `PC.fsc.MODE` is `Sv32`
<<<
[NOTE]
====
Some `PC` fields hold supervisor physical addresses or
Expand Down Expand Up @@ -991,7 +1023,9 @@ The process to translate an `IOVA` is as follows:
. Translation process is complete
When checking the `U` bit in a second-stage PTE, the transaction is treated as
not requesting supervisor privilege.
not requesting supervisor privilege. The `pte.xwr=010` encoding, as specified by
the Zicfiss cite:[CFI] extension for the Shadow Stack page type in single-stage
and VS-stage page tables, remains a reserved encoding for IO transactions.
When the translation process reports a fault, and the request is an Untranslated
request or a Translated request, the IOMMU requests the IO bridge to abort the
Expand Down Expand Up @@ -1151,8 +1185,8 @@ file and translating the address using the MSI page table is as follows:
process are equivalent to that of a regular RISC-V second-stage PTE with
`R`=`W`=`U`=1 and `X`=0. Similar to a second-stage PTE, when checking the `U`
bit, the transaction is treated as not requesting supervisor privilege.
. If the transaction is an Untranslated or Translated read-for-execute then stop
and report "Instruction access fault" (cause = 1).
.. If the transaction is an Untranslated or Translated read-for-execute then stop
and report "Instruction access fault" (cause = 1).
. MSI address translation process is complete.
[NOTE]
Expand Down Expand Up @@ -1182,6 +1216,8 @@ PTEs atomically. When updating of A and D bits in second-stage PTEs is enabled
memory access from the device using the translated address becomes globally
visible.
<<<
[NOTE]
====
The A and D bits are never cleared by the IOMMU. If the supervisor software does
Expand Down Expand Up @@ -1246,18 +1282,21 @@ process-context is 0 then a Success response with R and W bits set to 0 is
generated.
If the translation could be successfully completed but the requested
permissions are not present (Execute requested but no execute permission;
permissions are not present in either stage (Execute requested but no execute permission;
no-write not requested and no write permission; no read permission)
then a Success response is returned with the denied permission (R, W or X)
set to 0 and the other permission bits set to the value determined from the
page tables. The X permission is granted only if the R permission is also
granted. Execute-only translations are not compatible with PCIe ATS as PCIe
requires read permission to be granted if the execute permission is granted.
granted and the execute permission was requested. Execute-only translations are
not compatible with PCIe ATS as PCIe requires read permission to be granted
if the execute permission is granted.
When a Success response is generated for an ATS translation request, no fault
records are reported to software through the fault/event reporting mechanism,
even when the response indicates no access was granted or some permissions were
denied.
denied. Conversely, when a UR or CA response is generated for an ATS translation
request, the corresponding fault is reported to software through the fault/event
reporting mechanism.
If the translation request has an address determined to be an MSI address using
the rules defined by the <<MSI_ID>> but the MSI PTE is configured in MRIF
Expand Down Expand Up @@ -1346,11 +1385,14 @@ of "Page Request".
a "Page Request Group Response" message to the device.
When the IOMMU generates the response, the status field of the response depends
on the cause of the error.
on the cause of the error. If a fault condition prevents locating a valid device
context then the `PRPR` value assumed is 0.
<<<
The status is set to Response Failure if the following faults are encountered:
* `ddtp.iommu_mode` is `Off`
* `ddtp.iommu_mode` is `Off` (cause = 256)
* DDT entry load access fault (cause = 257)
* DDT entry misconfigured (cause = 259)
* DDT entry not valid (cause = 258)
Expand All @@ -1359,8 +1401,8 @@ The status is set to Response Failure if the following faults are encountered:
The status is set to Invalid Request if the following faults are encountered:
* `ddtp.iommu_mode` is `Bare`
* `EN_PRI` is set to 0
* `ddtp.iommu_mode` is `Bare` (cause = 260)
* `EN_PRI` is set to 0 (cause = 260)
The status is set to Success if no other faults were encountered but the
"Page Request" could not be queued due to the page-request queue being full
Expand Down Expand Up @@ -1399,6 +1441,8 @@ the following conditions:
* "Page Request" could not be queued due to the page-request queue being full
(`pqt == pqh - 1`) or had a overflow (`pqcsr.pqof == 1`).
<<<
[[CACHING]]
=== Caching in-memory data structures
Expand All @@ -1424,10 +1468,10 @@ more IDs to tag the cached entries to identify a specific entry or a
group of entries.
.Identifiers used to tag IOATC entries
[width=90%]
[%autowidth,float="center",align="center"]
[%header, cols="8,10,10"]
|===
|Data Structure cached |IDs used to tag entries | Invalidation command
^|Data Structure cached ^|IDs used to tag entries ^| Invalidation command
|Device Directory Table |`device_id` | <<IDDT, IODIR.INVAL_DDT>>
|Process Directory Table|`device_id`, `process_id` | <<IPDT, IODIR.INVAL_PDT>>
|First-stage page table
Expand Down Expand Up @@ -1498,8 +1542,8 @@ determined by `fctl.BE` or by `DC.tc.SBE` as follows:
[[ENDIAN_CONFIG]]
.Endianness of memory access to data structures
[width=75%]
[%header, cols="16,8"]
[%autowidth,float="center",align="center"]
[%header, cols="10,8"]
|===
^|Data Structure ^| Controlled by
| Device directory table | `fctl.BE`
Expand Down
6 changes: 3 additions & 3 deletions src/iommu_debug.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ when the process completes (successfully or due to encountering a fault). When
the `Go/Busy` bit goes from 1 to 0, a response is valid in the `tr_response`
register.

The IOMMU behavior is `UNSPECIFIED` if:
When the `Go/Busy` bit is 1, the IOMMU behavior is `UNSPECIFIED` if:

* The `tr_req_iova` or `tr_req_ctl` are modified when the `Go/Busy` bit is 1.
* IOMMU configurations such as `ddtp.iommu_mode`, etc. are modified.
* The `tr_req_iova` or `tr_req_ctl` are modified.
* IOMMU configurations, such as `ddtp.iommu_mode`, are modified.

The time to complete a translation request through this debug interface is
`UNSPECIFIED` but is required to be finite. If the IOMMU is serving translation
Expand Down
Loading

0 comments on commit 5446859

Please sign in to comment.