RISC-V is a highly configurable standard. Numerous unnamed implementation options exist, and the presence/absence of hundreds of named optional extensions alter the behavior of the architecture. Additionally, implementations are free to add custom, non-standard extensions on top of those ratified by RISC-V International (RVI). As a result, two equally valid implementations of RISC-V can have wildly different specifications.
This makes creating and maintaining generic RISC-V specifications a daunting task. A few challenging issues that arise include:
-
The RISC-V standard actually specifies two different architectures, namely RV32 and RV64.
-
The presence/absence of standard instructions and CSRs is affected by both unnamed implementation options (e.g., the number of implemented PMP registers) and named extensions.
-
The behavior of architecturally-visible state is dependent on the set of implemented extensions. For example, the legal values of
satp.MODE
depend on whether or not theSv32
,Sv39
,Sv48
, and/orSv57
extensions are implemented.
To tame this challenge, this specification generator takes the following approach:
-
The generic architecture (in the
arch/
folder) is described in a way that covers all implementation options. As much as possible, the architecture is defined in a structured way that can be easily parsed using any programming language with a YAML library. Complex behavior (e.g., instruction operation) and some configuration-dependent metadata is specified in an architecture definition language (IDL) that can formally define the architecture in arbitrary ways. -
An implementation of RISC-V is specified as a configuration containing all unnamed parameters and list of named supported extensions (in the
cfgs/
folder). -
A tool, included in this repository, can generate an implmentation-specific specification by applying the configuration to the generic spec. Behaviors that are not relevant are left out, and behaviors that are affected by multiple extensions are merged into a single description.
The architecture is specified in a series of YAML files for Extensions, Instructions, and Control and Status Registers (CSRs). Each extension/instruction/CSR has its own file.
+----------------------------------------------------------------------------------------------+ | Config (cfgs/NAME) | | +-------------------------+ +-------------------------+ +---------------------------------+ | | | Implementation Options | | Extension List | | Implementation Overlay | | | | (cfgs/NAME/params.yaml) | | (cfgs/NAME/params.yaml) | | (cfgs/NAME/arch_overlay/*.yaml) | | | +-------------------------+ +-------------------------+ +---------------------------------+ | +------------------------|-----|--------------------------------|------------------------------+ | | | +----------------+ v v v |{s} Generic | /----------\ /----------\ | Arch Spec |-->| Render | | Render | | (arch) | | (ERB) : | (ERB) : +----------------+ \----------/ \----------/ | | v v +-----------------------------+ +-----------------------------------+ | {s} Redered Arch Spec | | {s} Redered Overlay | | (gen/NAME/.rendered_arch) | | (gen/NAME/.rendered_overlay_arch) | +-----------------------------+ +------------------------------------+ | | v | /-------------------------\ | | Overlay |<-------------------+ | (gen/NAME/.merged_arch) : \-------------------------/ | | /-----------\ | Normalize | \-----------/ | v +-----------------------------+ | {s} Implementation-specific | | Archiecture Spec | | (gen/NAME/arch) | +-----------------------------+
The normalization step transitively determines all extensions/instructions/csrs that are implemented (e.g., because some extensions have implied dependencies) and ensures that missing fields are filled in with their defaults.
All specification data is written in YAML. The data for Extensions, Instructions, and CSRs follow
their own schemas, documented below. The files are checked for validity using
Json Schema, and the precise schemas are located in the schemas/
directory.
H: # (1)
type: privileged # (2)
versions: # (3)
- version: 1.0
ratification_date: 2019-12
requires: [S, '>= 1.12'] # (4)
interrupt_codes: # (5)
- num: 2
name: Virtual supervisor software interrupt
- num: 6
name: Virtual supervisor timer interrupt
- num: 10
name: Virtual supervisor external interrupt
- num: 12
name: Supervisor guest external interrupt
exception_codes: # (6)
- num: 10
name: Environment call from VS-mode
- num: 20
name: Instruction guest page fault
- num: 21
name: Load guest page fault
- num: 22
name: Virtual instruction
- num: 23
name: Store/AMO guest page fault
description: | # (7)
An Asciidoc description... (6)
-
Name of the extension, which must follow the RVI naming scheme.
-
Extension type: privileged or unprivileged
-
List of versions
-
[Optional] Declares a dependency on another extension (may be a list if there is more than one dependency).
-
[Optional] List of asynchronous interrupts added by this extension
-
[Optional] List of synchronous exceptions added by this extension
-
A description of the extension, as Asciidoc source
add: # (1)
long_name: Add
description: | # (2)
Add the value in rs1 to rs2, and store the result in rd.
Any overflow is thrown away.
encoding: # (3)
mask: 0000000----------000-----0110011
fields:
- name: rs2
location: 24-20
- name: rs1
location: 19-15
- name: rd
location: 11-7
definedBy: I # (4)
assembly: xd, xs1, xs2 # (5)
access: # (6)
s: always
u: always
vs: always
vu: always
operation(): X[rd] = X[rs1] + X[rs2]; # (7)
-
The instruction mnemonic, in lowercase
-
Asciidoc description of the instruction
-
Encoding of the instruction. 'mask' specifies the values and position of opcode fields, and 'fields' specifies the locations of decode variables.
-
Extension that defines this instruction. May be a list if the instruction is defined by multiple extensions.
-
Assembly format, to be used by ISS/disassembler/compiler/etc.
-
Per-mode access rights (always, sometimes, or none). When 'sometimes', a field 'access-detail' should also be provided.
-
Formal definition of the instruction operation, in IDL
Some instructions have decode fields that cannot take a certain value. This is especially common in the C
extension where, for example, some register specifier fields can be anything but x0. That can be represented by adding a not_mask
key to the encoding:
c.addi
encoding:
mask: 000-----------01
not_mask: ----00000------- # rs1/rd cannot be 0
fields:
- name: imm
location: 12|6-2
- name: rs1_rd
location: 11-7
Not mask can also be a list when more than one value is prohibited (e.g., c.lui
prohibits both x0 and x2 for rd
).
Some fields are shifted before use, and can be represented using the left_shift
key:
jal
encoding:
mask: -------------------------1101111
fields:
- name: imm
# lsb of the immediate is always zero, so it isn't encoded in the instruction
# this is also an example of representing decode variables that are split in the
# encoding
location: 31|19-12|20|30-21
left_shift: 1
- name: rd
location: 11-7
marchid
marchid: # (1)
long_name: Machine Architecture ID
address: 0xf12 # (2)
priv_mode: M # (3)
length: MXLEN # (4)
description: | # (5)
Asciidoc description
definedBy: Sm # (6)
fields: # (7)
Architecture:
location_rv32: 31-0 # (8)
location_rv64: 63-0
type: RO # (7)
description: Vendor-specific microarchitecture ID. # (9)
reset_value(): return ARCH_ID; # (10)
-
CSR name
-
CSR address (used by CSRs that not indirect)
-
Least-privileged mode required to access the CSR
-
Length of the CSR, in bits. Can either be an integer (e.g. 32, 64), or 'MXLEN', 'SXLEN', or 'VSXLEN' when the length is equal to the XLEN in M, S, or VS mode, respectively.
-
Asciidoc description
-
Defining extension. Can be list when more than one extension defines the CSR.
-
List of fields in the CSR
-
Location. In this case, the location changes with XLEN, so
location_rv32
andlocation_rv64
are used. When the location does not change, use the single keylocation
. -
Type of the field. See below for more information.
-
Reset value. In this case, the reset value is determined by the configuration, so it is specified as an IDL function.
CSR fields are given a type, which does not necessarily correspond to the WARL/WLRL types in the RVI specs. We use a different format here because the RVI CSR types are vauge and inconsistent. The types are:
Type | Meaning |
---|---|
RO |
Read-only |
RO-H |
Read-only, and hardware updates the field |
RW |
Read-write |
RW-R |
Read-write, but only a restricted set of values are allowed |
RW-H |
Read-write, and hardware updates the field |
RW-RH |
Read-write, only a restricted set of values are allowed, and hardware updates the field |
In many cases, the values of CSR and/or CSR field data are configuration dependent. Some of that is covered directly by the data model (e.g., with location_rv32
, location_rv64
), but some cases are too complex to express with YAML. For this reason, many of the keys can be specified as IDL functions. See the schema documentation and examples in the arch/csr
folder for more information.
Some keys that only apply to certain CSRs are not shown above.