Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement format-specific encoding mechanism #546

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ThinkOpenly
Copy link
Contributor

This is a draft, with some extraneous commits included that greatly facilitate testing. The only important commit is marked "[DRAFT]", and reviews should be confined to that.

There's still a lot of content, so if I may direct the reviewers eyes:

  • model/riscv_insts_begin.sail: This sets up the new global data content. In particular:
    • union instruction_input: A tagged union for all of the inputs for each instruction format.
    • scattered mapping fmtencdec: A new mapping from ast to instruction_input. Intended to replace scattered mapping encdec.
    • enum Format: one enum value per format.
    • scattered function opcode2format: maps 7-bit opcode values to the associated format enum.
  • `model/riscv_insts_end.sail:
    • mapping fmt2bits: A new mapping from instruction_input to the 32-bit opcode value. This contains one bidirectional mapping for each format. This is a central location for enforcing opcode layouts, including field order and width, as well as important things like clipping low-order bits of field values where those bits have presumed/enforced values.
  • model/riscv_insts_base.sail, model/riscv_insts_next.sail, model/riscv_insts_zicsr.sail:
    • Examples of use of opcode2format to explicitly bind an instruction to its respective format.
    • Examples of use of fmtencdec: to map ast inputs and constant values to the appropriate instruction_input tagged union member.
    • Note here that the encdec (renamed oldencdec) is obsoleted, and can be ignored, but I left it in for comparison/review purposes.

The current instruction encoding mechanism for RISC-V in Sail uses a simple
mapping, from “ast” to a 32-bit opcode value (ignoring compressed
instructions for now). For example:
```
union clause ast = RISCV_JALR : (bits(12), regidx, regidx)

mapping clause encdec = RISCV_JALR(imm, rs1, rd)
  <-> imm @ rs1 @ 0b000 @ rd @ 0b1100111
```

This is certainly functional.

However, what’s missing from this representation is:
- Any notion of the opcode format
- Any enforcement mechanism for the order of the respective bit fields with
  respect to the opcode format
- Any enforcement mechanism for the sizes of the respective bit fields

A *wrong* encoding representation would readily compile:
```
mapping clause encdec = RISCV_JALR(imm, rs1, rd)
  <-> 0b1100111 @ rd @ rs1 @ 0b000 @ imm /* WRONG */
```
This would (hopefully) be caught in runtime testing, of course.

This (draft) PR proposes an encoding mechanism with enhancements:
- Each opcode is clearly associated with an encoding format
- Each instruction only provides a structured list of encoding-agnostic opcode
  inputs
- A format-specific, instruction-agnostic encoding mechanism is provided

This forces instructions in the Sail code to clearly identify their formats,
and allows them to provide the associated inputs for the encoding without
concern for the specifics of the encoding.
It also isolates the encoding scheme to a small set of format-specific encodings
separate from the instruction definitions.

An example for format-specific encoding:
```
enum Format = { R_Format, U_Format, I_Format, J_Format, S_Format, B_Format, Unknown_Format }
val opcode2format : bits(7) -> Format
scattered function opcode2format

union instruction_input = {
  IFormat: { imm: bits(12), rs1: regidx, funct3: bits(3), rd: regidx, opcode: bits(7) },
[...]
}

val encdec : ast <-> instruction_input
scattered mapping encdec

mapping fmt2bits : instruction_input <-> bits(32) = {
  IFormat(struct { imm = imm, rs1 = rs1, funct3 = funct3, rd = rd, opcode = opcode }) if opcode2format(opcode) == I_Format
    <-> imm @ rs1 @ funct3 @ rd @ opcode if opcode2format(opcode) == I_Format,
[...]
}
```

An example of a instruction’s encoding information:
```
function clause opcode2format 0b1100111 = I_Format

mapping clause encdec = RISCV_JALR(imm, rs1, rd)
  <-> IFormat(struct { imm = imm, rs1 = rs1, funct3 = 0b000, rd = rd, opcode = 0b1100111 })
```
@ThinkOpenly ThinkOpenly changed the title Upstream formats Implement format-specific encoding mechanism Sep 10, 2024
@ThinkOpenly
Copy link
Contributor Author

Also, please see the commit message in the "[DRAFT]" commit, which explains the rationale for this work.

Copy link

github-actions bot commented Sep 10, 2024

Test Results

230 tests   - 482    57 ✅  - 655   0s ⏱️ ±0s
  1 suites  -   5     0 💤 ±  0 
  1 files   ±  0   173 ❌ +173 

For more details on these failures, see this check.

Results for commit 738896b. ± Comparison against base commit d36ea53.

This pull request removes 482 tests.
Building 32-bit RISCV C emulator
Building 32-bit RISCV OCaml emulator
Building 32-bit RISCV RVFI C emulator
Building 64-bit RISCV OCaml emulator
Building 64-bit RISCV RVFI C emulator
C-32 rv32mi-p-breakpoint.elf
C-32 rv32mi-p-csr.elf
C-32 rv32mi-p-illegal.elf
C-32 rv32mi-p-ma_addr.elf
C-32 rv32mi-p-ma_fetch.elf
…

♻️ This comment has been updated with latest results.

@ThinkOpenly
Copy link
Contributor Author

Test Results

230 tests   - 482     0 ✅  - 712   0s ⏱️ ±0s   1 suites  -   5     0 💤 ±  0    1 files   ±  0   230 ❌ +230 

This is expected. This Draft PR has a working framework, but is far from complete/ready.

@ThinkOpenly
Copy link
Contributor Author

Test Results

230 tests   - 482    57 ✅  - 655   0s ⏱️ ±0s   1 suites  -   5     0 💤 ±  0    1 files   ±  0   173 ❌ +173 

Actually, some passes were expected. I had an issue with the first attempt because I'm still using Sail 0.17.1 and the CI uses 0.18, and I didn't properly end some of the new scattered mapping instances. (Now fixed.)

@Alasdair
Copy link
Collaborator

I'm not sure I understand the purpose of the opcode2format function. It seems like the format enum just mirrors the instruction format already encoded in the union, I would have expected it to match table 70 (RISC-V base opcode map) in the ISA manual for the uncompressed instructions and table 36 (RVC opcode map instructions) for the compressed ones.

@Alasdair
Copy link
Collaborator

Alasdair commented Sep 10, 2024

I think I would expect something like:

scattered enum opcode_map

enum clause opcode_map = LOAD
enum clause opcode_map = LOAD_FP
enum clause opcode_map = MISC_MEM
enum clause opcode_map = OP_IMM
enum clause opcode_map = AUIPC
enum clause opcode_map = OP_IMM_32
enum clause opcode_map = STORE
enum clause opcode_map = STORE_FP
enum clause opcode_map = AMO
enum clause opcode_map = OP
enum clause opcode_map = LUI
enum clause opcode_map = OP_32
enum clause opcode_map = MADD
enum clause opcode_map = MSUB
enum clause opcode_map = NMSUB
enum clause opcode_map = NMADD
enum clause opcode_map = OP_FP
enum clause opcode_map = OP_V
enum clause opcode_map = BRANCH
enum clause opcode_map = JALR
enum clause opcode_map = RESERVED
enum clause opcode_map = JAL
enum clause opcode_map = SYSTEM
enum clause opcode_map = OP_VE

// The RISC-V base opcode map for inst[6..2], where inst[1..0] == 0b11
// If inst[4..2] is 0b111, then this is reserved for opcodes wider than 32-bits.
scattered mapping opcode_map_bits : opcode_map <-> bits(5)

mapping clause opcode_map_bits = LOAD <-> 0b00000
mapping clause opcode_map_bits = LOAD_FP <-> 0b00001
// 0b00010 is for custom ISA extensions (custom-0)
mapping clause opcode_map_bits = MISC_MEM <-> 0b00011
mapping clause opcode_map_bits = OP_IMM <-> 0b00100
mapping clause opcode_map_bits = AUIPC <-> 0b00101
mapping clause opcode_map_bits = OP_IMM_32 <-> 0b00110

mapping clause opcode_map_bits = STORE <-> 0b01000
mapping clause opcode_map_bits = STORE_FP <-> 0b01001
// 0b01010 is for custom ISA extensions (custom-1)
mapping clause opcode_map_bits = AMO <-> 0b01011
mapping clause opcode_map_bits = OP <-> 0b01100
mapping clause opcode_map_bits = LUI <-> 0b01101
mapping clause opcode_map_bits = OP_32 <-> 0b01110

mapping clause opcode_map_bits = MADD <-> 0b10000
mapping clause opcode_map_bits = MSUB <-> 0b10001
mapping clause opcode_map_bits = NMSUB <-> 0b10010
mapping clause opcode_map_bits = NMADD <-> 0b10011
mapping clause opcode_map_bits = OP_FP <-> 0b10100
mapping clause opcode_map_bits = OP_V <-> 0b10101
// 0b10110 is for custom ISA extensions or for RV128 (custom-2/rv128)

mapping clause opcode_map_bits = BRANCH <-> 0b11000
mapping clause opcode_map_bits = JALR <-> 0b11001
mapping clause opcode_map_bits = RESERVED <-> 0b11010
mapping clause opcode_map_bits = JAL <-> 0b11011
mapping clause opcode_map_bits = SYSTEM <-> 0b11100
mapping clause opcode_map_bits = OP_VE <-> 0b11101
// 0b11110 is for custom ISA extensions or for RV128 (custom-3/rv128)

Names might need to change, but this is following the structure of the ISA manual. It needs to be scattered so extensions could slot into the custom opcode spaces, even though it's very verbose...

@ThinkOpenly
Copy link
Contributor Author

Thanks for the comments, @Alasdair. My ignorance is showing, I know, but I don't understand how the above mappings help if opcode2format is removed/replaced. None of what you posted includes references to opcode formats.

I tried a very simple experiment (based on my very limited Sail knowledge), and removed the conditions from one of the fmt2bits mappings:

  UFormat(struct { imm = imm, rd = rd, opcode = opcode }) /* if opcode2format(opcode) == U_Format */
    <-> imm @ rd @ opcode /* if opcode2format(opcode) == U_Format */,

... and then at runtime, the mappings are broken:

[...]
Pattern match failure in fmtencdec_backwards
mem[X,0x0000000000001004] -> 0x8593
mem[X,0x0000000000001006] -> 0x0202

How would your suggestions be used to discriminate formats based on opcode?

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 11, 2024

My concern in general is that opcode formats are a general rule of thumb, not a rigid scheme that every instruction must adhere to in the exact same way (even in the base I extension, shifts aren't really I-type, only if you squint, despite what the manual says). And I'm not sure how much reducing the repetition even matters, nor if it improves readability.

@ThinkOpenly
Copy link
Contributor Author

My concern in general is that opcode formats are a general rule of thumb, not a rigid scheme that every instruction must adhere to in the exact same way

This is easily accommodated, I think, by adding more fmt2bits mappings for the odd cases. (And may be a flag to whoever finds the need to do so that they're coloring outside the lines.)

I would imagine, also, that someone implementing a decoder would prefer well-defined formats.

(even in the base I extension, shifts aren't really I-type, only if you squint, despite what the manual says).

This is because of the extra "shift amount" bit for RV64, noted as a "specialization of the I-type format"? Yeah.

And I'm not sure how much reducing the repetition even matters, nor if it improves readability.

Reducing the repetition of what? The bit concatenations? Reducing repetition of that was not a goal. Adding structure to the encoding mechanism was a goal. Refactoring that into a few per-format encoding schemes was an obvious way to do that, with additional benefits.

Improving readability was not an explicit goal, although I'd argue the new code is more pleasing to the eye than the old code:

-mapping clause encdec = RISCV_JAL(imm_19 @ imm_7_0 @ imm_8 @ imm_18_13 @ imm_12_9 @ 0b0, rd)
-   <-> imm_19 : bits(1) @ imm_18_13 : bits(6) @ imm_12_9 : bits(4) @ imm_8 : bits(1) @ imm_7_0 : bits(8) @ rd @ 0b1101111
 
+function clause opcode2format 0b1101111 = J_Format
+mapping clause fmtencdec = RISCV_JAL(imm, rd)
+  <-> JFormat(struct { imm = imm, rd = rd, opcode = 0b1101111 })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants