This section describes in general terms the information which must be passed from the RISC-V hart to the trace encoder for the purposes of Instruction Trace, and distinguishes between what is mandatory, and what is optional.
The following information is mandatory:
-
The number of instructions that are being retired;
-
Whether there has been an exception or interrupt, and if so the cause (from the ucause/scause/vscause/mcause CSR) and trap value (from the utval/stval/vstval/mtval CSR).
The register set to output should be the set that is updated as a result of the exception (i.e. the set associated with the privilege level immediately following the exception);
-
The current privilege level of the RISC-V hart;
-
The instruction_type of retired instructions for:
-
Jumps with a target that cannot be inferred from the source code;
-
Taken and nontaken branches;
-
Return from exception or interrupt (*ret instructions).
-
-
The instruction_address for:
-
Jumps with a target that cannot be inferred from the source code;
-
The instruction retired immediately after a jump with a target that cannot be inferred from the source code (also referred to as the target or destination of the jump);
-
Taken and nontaken branches;
-
The last instruction retired before an exception or interrupt;
-
The first instruction retired following an exception or interrupt;
-
The last instruction retired before a privilege change;
-
The first instruction retired following a privilege change.
-
The following information is optional:
-
Context or Time information:
-
The context and/or hart ID and/or time;
-
The type of action to take when context or time data changes.
-
-
The instruction_type of instructions for:
-
Calls with a target that cannot be inferred from the source code;
-
Calls with a target that can be inferred from the source code;
-
Jumps with a target that cannot be inferred from the source code;
-
Jumps with a target that can be inferred from the source code;
-
Returns with a target that cannot be inferred from the source code;
-
Returns with a target that can be inferred from the source code;
-
Co-routine swap;
-
Other jumps which don’t fit any of the above classifications with a target that cannot be inferred from the source code;
-
Other jumps which don’t fit any of the above classifications with a target that can be inferred from the source code.
-
-
If context or time is supported then the instruction_address for:
-
The last instruction retired before a context or a time change;
-
The first instruction retired following a context or time change.
-
-
Whether jump targets are sequentially inferable or not.
The mandatory information is the bare-minimum required to implement the branch trace algorithm outlined in [Algorithm]. The optional information facilitates alternative or improved trace algorithms:
-
Implicit return mode (see [sec:implicit-return]) requires the encoder to keep track of the number of nested function calls, and to do this it must be aware of all calls and returns regardless of whether the target can be inferred or not;
-
A simpler algorithm useful for basic code profiling would only report function calls and returns, again regardless of whether the target can be inferred or not;
-
Branch prediction techniques can be used to further improve the encoder efficiency, particularly for loops (see [sec:branch-prediction]). This requires the encoder to be aware of the address of all branches, whether they are taken or not.
-
Uninferable jumps can be treated as inferable (which don’t need to be reported in the trace output) if both the jump and the preceding instruction which loads the target into a register have been traced.
Jumps are classified as inferable, or uninferable. An inferable jump has a target which can be deduced from the binary executable or representation thereof (e.g. ELF). For the purposes of this specification, the following strict definition applies:
If the target of a jump is supplied via a constant embedded within the jump opcode, it is classified as inferable. Jumps which are not inferable are by definition uninferable.
However, there are some jump targets which can still be deduced from the binary executable by considering pairs of instructions even though by the above definition they are classified as uninferable. Specifically, jump targets that are supplied via
-
an lui or c.lui (a register which contains a constant), or
-
an auipc (a register which contains a constant offset from the PC).
Such jump targets are classified as sequentially inferable if the pair of instructions are retired consecutively (i.e. the auipc, lui or c.lui immediately precedes the jump). Note: the restriction that the instructions are retired consecutively is necessary in order to minimize the additional signalling needed between the hart and the encoder, and should have a minimal impact on trace efficiency as it is anticipated that consecutive execution will be the norm. Support for sequentially inferable jumps is optional.
Jumps may optionally be further classified according to the recommended calling convention:
-
Calls:
-
jal x1;
-
jal x5;
-
jalr x1, rs where rs != x5;
-
jalr x5, rs where rs != x1;
-
c.jalr rs1 where rs1 != x5;
-
c.jal.
-
-
Jumps:
-
jal x0;
-
c.j;
-
jalr x0, rs where rs != x1 and rs != x5;
-
c.jr rs1 where rs1 != x1 and rs1 != x5.
-
-
Returns:
-
jalr rd, rs where (rs == x1 or rs == x5) and rd != x1 and rd != x5;
-
c.jr rs1 where rs1 == x1 or rs1 == x5.
-
-
Co-routine swap:
-
jalr x1, x5;
-
jalr x5, x1;
-
c.jalr x5.
-
-
Other:
-
jal rd where rd != x0 and rd != x1 and rd != x5;
-
jalr rd, rs where rs != x1 and rs != x5 and rd != x0 and rd != x1 and rd != x5.
-
The encoder is intended to encode the instructions executed on a single hart.
It is however commonplace for a RISC-V core to contain multiple harts. This can be supported by the core in several different ways:
-
Implement a separate instance of the interface per hart. Each instance can be connected to a separate encoder instance, allowing all harts to be traced concurrently. Alternatively, external muxing may be used in conjunction with a single encoder in order to trace one particular hart at a time;
-
Implement a singe interface for the core, with muxing inside the core to select which hart to connect to the interface.
(Whilst it is technically feasible to use a single encoder with multiple harts operating in a fine-grained multi-threaded configuration, the frequent context changes that would occur as a result of thread-switching would result in extremely poor encoding efficiency, and so this configuration is not recommended.)
This section describes the interface between a RISC-V hart and the trace encoder that conveys the information described in the section Instruction Trace Interface requirements. Signals are assigned to one of the following groups:
-
M: Mandatory. The interface must include an instance of this signal.
-
O: Optional. The interface may include an instance of this signal.
-
MR: Mandatory, may be replicated. For harts that can retire a maximum of N "special" instructions per clock cycle, the interface must include N instances of this signal.
-
OR: Optional, may be replicated. For harts that can retire a maximum of N "special" per clock cycle, the interface must include zero or N instances of this signal.
-
BR: Block, may be replicated. Mandatory for harts that can retire multiple instructions in a block. Replication as per OR. If omitted, the interface must include SR group signals instead.
-
SR: Single, may be replicated. Mandatory for harts that can only retire one instruction in a block. Replication as per OR (see Alternative multiple-retirement interface configurations). If omitted, the interface must include BR group signals instead.
"Special" instructions are those that require itype to be non-zero.
Signal | Group | Function |
---|---|---|
itype[itype_width_p-1:0] |
MR |
Termination type of the instruction block. Encoding given in Instruction Type (itype) encoding (see Jump classification and target inference for definitions of codes 6 - 15). |
cause[ecause_width_p-1:0] |
M |
Exception or interrupt cause (ucause/scause/ vscause/mcause). Ignored unless itype=1 or 2. |
tval[iaddress_width_p-1:0] |
M |
The associated trap value, e.g. the faulting virtual address for address exceptions, as would be written to the utval/stval/vstval/mtval CSR. Future optional extensions may define tval to provide ancillary information in cases where it currently supplies zero. Ignored unless itype=1. |
priv[privilege_width_p-1:0] |
M |
Privilege level for all instructions retired on this cycle. Encoding given in Privilege level (priv) encoding. Codes 4-7 optional. |
iaddr[iaddress_width_p-1:0] |
MR |
The address of the 1st instruction retired in this block. Invalid if iretire=0 unless itype=1, in which case it indicates the address of the instruction which incurred the exception. |
context[context_width_p-1:0] |
O |
Context for all instructions retired on this cycle. |
time[time_width_p-1:0] |
O |
Time generated by the core. |
ctype[ctype_width_p-1:0] |
O |
Reporting behavior for context. Encoding given in Table Context type ctype values and corresponding actions. Codes 2-3 optional. |
sijump |
OR |
If itype indicates that this block ends with an uninferable discontinuity, setting this signal to 1 indicates that it is sequentially inferable and may be treated as inferable by the encoder if the preceding auipc, lui or c.lui has been traced. Ignored for itype codes other than 6, 8, 10, 12 or 14. |
Instruction interface signals and Instruction interface signals - multiple retirement per block list the signals in the interface designed to efficiently support retirement of multiple instructions per cycle. The following discussion describes the multiple-retirement behavior. However, for harts that can only retire one instruction at a time, the signalling can be simplified, and this is discussed subsequently in Simplifications for single-retirement.
Signal | Group | Function |
---|---|---|
iretire[iretire_width_p-1:0] |
BR |
Number of halfwords represented by instructions retired in this block. |
ilastsize[ilastsize_width_p-1:0] |
BR |
The size of the last retired instruction is 2ilastsize half-words. |
Signal | Group | Function |
---|---|---|
iretire[0:0] |
SR |
Number of instructions retired in this block (0 or 1). |
ilastsize[ilastsize_width_p-1:0] |
SR |
The size of the retired instruction is 2ilastsize half-words. |
Value | Description |
---|---|
0 |
Final instruction in the block is none of the other named itype codes |
1 |
Exception. An exception that traps occurred following the final retired instruction in the block |
2 |
Interrupt. An interrupt that traps occurred following the final retired instruction in the block |
3 |
Exception or interrupt return |
4 |
Nontaken branch |
5 |
Taken branch |
6 |
Uninferable jump if itype_width_p is 3, reserved otherwise |
7 |
reserved |
8 |
Uninferable call |
9 |
Inferrable call |
10 |
Uninferable jump |
11 |
Inferrable jump |
12 |
Co-routine swap |
13 |
Return |
14 |
Other uninferable jump |
15 |
Other inferable jump |
Value | Description |
---|---|
0 |
U |
1 |
S/HS |
2 |
reserved |
3 |
M |
4 |
D (debug mode) |
5 |
VU |
6 |
VS |
7 |
reserved |
The information presented in a block represents a contiguous block of instructions starting at iaddr, all of which retired in the same cycle. Note if itype is 1 or 2 (indicating an exception or an interrupt), the number of instructions retired may be zero. cause and tval are only defined if itype is 1 or 2. If iretire=0 and itype=0, the values of all other signals are undefined.
iretire contains the number of (16-bit) half-words represented by instructions retired in this block, and ilastsize the size of the last instruction. Half-words rather than instruction count enables the encoder to easily compute the address of the last instruction in the block without having access to the size of every instruction in the block.
itype can be 3 or 4 bits wide. If itype_width_p is 3, a single code (6) is used to indicate all uninferable jumps. This is simpler to implement, but precludes use of the implicit return mode (see [sec:implicit-return]), which requires jump types to be fully classified.
Whilst iaddr is typically a virtual address, it does not affect the encoder’s behavior if it is a physical address.
For harts that can retire a maximum of N non-zero itype values per clock cycle, the signal groups MR, OR and either BR or SR must be replicated N times. Typically N is determined by the maximum number of branches that can be retired per clock cycle. Signal group 0 represents information about the oldest instruction block, and group N-1 represents the newest instruction block. The interface supports no more than one privilege change, context change, exception or interrupt per cycle and so signals in groups M and O are not replicated. Furthermore, itype can only take the value 1 or 2 in one of the signal groups, and this must be the newest valid group (i.e. iretire and itype must be zero for higher numbered groups). If fewer than N groups are required in a cycle, then lower numbered groups must be used first. For example, if there is one branch, use only group 0, if there are two branches, instructions up to the 1st branch must be reported in group 0 and instructions up to the 2nd branch must be reported in group 1 and so on.
sijump is optional and may be omitted if the hart does not implement the logic to detect sequentially inferable jumps. If the encoder offers an sijump input it must also provide a parameter to indicate whether the input is connected to a hart that implements this capability, or tied off. This is to ensure the decoder can be made aware of the hart’s capability. Enabling sequentially inferable jump mode in the encoder and decoder when the hart does not support it will prevent correct reconstruction by the decoder.
The context and/or the time field can be used to convey any additional information to the decoder. For example:
-
The address space and virtual machine IDs (ASID and VMID respectively). Where present it is recommended these values be wired to bits [15:0] and [29:16];
-
The software thread ID;
-
The process ID from an operating system;
-
It could be used to convey the values of CSRs to the decoder by setting context to the CSR number and value when a CSR is written;
-
In cases where a single encoder is being shared amongst multiple harts (see Relationship between RISC-V core and the encoder), it could also be used to indicate the hart ID, in cases where the hart ID can be changed dynamically.
-
Time from within the hart
Context type ctype values and corresponding actions specifies the actions for the various ctype values. A typical behavior would be for this signal to remain zero except on the 1st retirement after a context change or when a time value should be reported. ctype_width_p may be 1 or 2. The reduced width option only provides support for reporting context changes imprecisely.
Type | Value | Actions |
---|---|---|
Unreported |
0 |
No action (don’t report context). |
Report context imprecisely |
1 |
An example would be a SW thread or operating system process change. Report the new context value at the earliest convenient opportunity. It is reported without any address information, and the assumption is that the precise point of context change can be deduced from the source code (e.g. a CSR write). |
Report context precisely |
2 |
Report the address of the 1st instruction retired in this block, and the new context. If there were unreported branches beforehand, these need to be reported first. Treated the same as a privilege change. |
Report context as an asynchronous discontinuity |
3 |
An example would be a change of hart. Need to report the last instruction retired on the previous context, as well as the 1st on the new context. Treated the same as an exception. |
For harts that can only retire one instruction at a time, the interface can be simplified to the signals listed in Instruction interface signals and Instruction interface signals - single retirement per block. The simplifications can be summarized as follows:
-
iretire simply indicates whether an instruction retired or not;
Note: ilastsize is still needed in order to determine the address of the next instruction, as this is the predicted return address for implicit return mode (see [sec:implicit-return]).
The parameter retires_p which indicates to the encoder the maximum number of instructions that can be retired per cycle can be used by an encoder capable of supporting single or multiple retirement to select the appropriate interpretation of iretire.
For a hart that can retire multiple instructions per cycle, but no more than one branch, the preferred solution is to use one instance of signals from groups BR, MR and OR. However, if the hart can retire N branches in a cycle, N instances of signals from groups MR, OR and either SR or BR must be used (each instance can be either a single instruction or a block).
If the hart can retire N instructions per cycle, but only one branch, it is allowed (though not recommended) to provide explicit details of every instruction retired by using N instances of signals from groups SR, MR and OR.
Optional sideband signals may be included to provide additional functionality, as described in Optional sideband encoder input signals and Optional sideband encoder output signals.
Note, any user defined information that needs to be output by the encoder will need to be applied via the context input.
Signal | Group | Function |
---|---|---|
impdef[impdef_width_p-1:0] |
O |
Implementation defined sideband signals. A typical use for these would be for filtering (see [ch:filtering]. |
trigger[2+:0] |
[1:0]: O |
A pulse on bit 0 will cause the encoder to start tracing, and continue until further notice, subject to other filtering criteria also being met. A pulse on bit 1 will cause the encoder to stop tracing until further notice. See Using trigger outputs from the Debug Module). |
halted |
O |
Hart is halted. Upon assertion, the encoder will output a packet to report the address of the last instruction retired before halting, followed by a support packet to indicate that tracing has stopped. Upon deassertion, the encoder will start tracing again, commencing with a synchronization packet. Note: If this signal is not provided, it is strongly recommended that Debug mode can be signalled via a 3-bit privilege signal. This will allow tracing in Debug mode to be controlled via the optional filtering capabilities. |
reset |
O |
Hart is in reset. Provided the encoder is in a different reset domain to the hart, this allows the encoder to indicate that tracing has ended on entry to reset, and restarted on exit. Behavior is as described above for halt. |
Signal | Group | Function |
---|---|---|
stall |
O |
Stall request to hart. Some applications may require lossless trace, which can be achieved by using this signal to stall the hart if the trace encoder is unable to output a trace packet (for example due to back-pressure from the packet transport infrastructure). |
The debug module of the RISC-V hart may have a trigger unit. This defines a match control register (mcontrol) containing a 4-bit action field, and reserves codes 2 - 5 of this field for trace use. These action codes are hereby defined as shown in table Debug Module trigger support (mcontrol action). If implemented, each action must generate a pulse on an output from the hart, on the same cycle as the instruction which caused the trigger is retired.
Value | Description |
---|---|
2 |
Trace-on. This should be connected to trigger[0] if the encoder provides it. |
3 |
Trace-off. This should be connected to trigger[1] if the encoder provides it. |
4 |
Trace-notify. This should be connected to trigger[1 + blocks:2] if the encoder provides it. This will cause the encoder to output a packet containing the address of the last instruction in the block if it is enabled. One bit per block. |
Trace-on and Trace-off actions provide a means for the hart to control when tracing starts and stops. It is recommended that tracing starts from the oldest instruction retired in the cycle that Trace-on is asserted, and stops following the newest instruction retired in the cycle that Trace-off is asserted (subject to any optional filtering).
Trace-notify provides means to ensure that a specified instruction is explicitly reported (subject to any optional filtering). This capability is sometimes known as a watchpoint.
Retired | Instruction Trace Block |
---|---|
1000: divuw |
iretire=7, iaddr=0x1000, itype=8 |
0940: addi |
iretire=3, iaddr=0x0940, itype=4 |
0946: c.bnez |
iretire=1, iaddr=0x0946, itype=5 |
0988: lbu |
iretire=4, iaddr=0x0988, itype=0 |
This section describes in general terms the information which must be passed from the RISC-V hart to the trace encoder for the purposes of Data Trace, and distinguishes between what is mandatory, and what is optional.
If Data Trace is not needed in a system then there is no requirement for the RISC-V hart to supply any of the signals in Data Trace Interface.
Data trace supports up to four data access types: load, store, atomic and CSR. Support for both atomic and CSR accesses are independently optional.
The signalling protocol can take one of two forms, depending on the needs of the RISC-V hart: unified or split.
Unified is the simplest form, suitable for simpler, in-order harts. In this form, all information about a data access is signalled by the RISC-V hart in the same cycle that the associated data access instruction is reported on the instruction trace interface.
For harts with out of order or speculative execution capabilities, many loads may be in progress simultaneously, and this approach is not practical as it would require the hart to maintain a large amount of state relating to all the in-progress loads. For this reason, the interface also supports splitting loads into two parts:
-
The request phase provides all the information about the load that originates from the hart (address, size, etc.) when the instruction retires;
-
The response phase provides the data and response status when it has been returned to the hart from the memory system.
The two parts of a split load are associated by use of a transaction ID.
The Zc (code-size reduction) extension introduced push and pop instructions (cm.push, cm.pop, cm.popret and cm.popretz) that each result in multiple loads or stores. To allow the resulting loads or stores to be associated with the correct instruction, these multi-memory-access instructions (and any other future instructions with similar characteristics) must be reported on the instruction trace interface multiple times (once for each individual load or store) using itype 0 except for the final load or store, which must retire using the natural itype for the instruction (for example, a cm.popret instruction must use itype 13 for the final load to signal the return). The instruction address reported will be the same for each occurrence.
The following illustrations show the retirement sequences when a single cm.push or cm.popret is used to push or pop 4 registers from the stack. They assume a RISC-V to encoder interface that can report a block of 1 or more retired instructions and one load or store per cycle. Each comprises 4 elements, and shows the instruction information reported for each load and store. As detailed in section #sec:InstructionTraceInterface[1.2], this takes the form of the address of an instruction, the length of the block (1 for a single instruction) and the type of the final instruction in the block. In each element, ’Block’ indicates a block of 1 or more instructions (i.e. could also be a single instruction), whereas ’Single’ indicates a single instruction (i.e. a block with a length of 1).
A cm.push is equivalent to 4 store instructions:
-
Block - last instruction is cm.push, itype 0 (data trace interface reports 1st store);
-
Single - cm.push, itype 0 (data trace interface reports 2nd store);
-
Single - cm.push, itype 0 (data trace interface reports 3rd store);
-
Block - 1st instruction is cm.push, itype dependent on last instruction in block (data trace interface reports 4th store);
A cm.popret is equivalent to 4 loads and a return:
-
Block - last instruction is cm.popret, itype 0 (data trace interface reports 1st load);
-
Single - cm.popret, itype 0 (data trace interface reports 2nd load);
-
Single - cm.popret, itype 0 (data trace interface reports 3rd load);
-
Single - cm.popret, itype 13 (data trace interface reports 4th load);
If an exception occurs part way through the sequence of loads or stores initiated by such an instruction, and the instruction is re-executed after the exception handler has been serviced, the load or store sequence must recommence from the beginning.
Note
|
This is required for data trace only. If data trace is not implemented, the push or pop may instead be reported just once in the normal way when all associated loads or stores complete successfully. |
This section describes the interface between a RISC-V hart and the trace encoder that conveys the information described in the Data Trace Interface requirements. Signals are assigned to one of the following groups:
-
M: Mandatory. The interface must include an instance of this signal;
-
U: Unified. Mandatory for unified signalling;
-
S: Split. Mandatory for split load signalling;
-
O: Optional. The interface may include an instance of this signal.
All signals in M, U and O groups are only valid when dretire is high. Signals in the S group are valid as indicated in table Data interface signals.
For harts that can retire a maximum of M data accesses per cycle, the implemented signal groups must be replicated M times. If fewer than M groups are required in a cycle, then lower numbered groups must be used first. For example, if there is one data access, use only group 0.
Signal | Group | Function |
---|---|---|
dretire |
M |
Data access retired (when high) |
dtype[dtype_width_p-1:0] |
M |
Data access type. Encoding given in Data access type (dtype) encoding |
daddr[daddress_width_p-1:0] |
M |
The data access address |
dsize[dsize_width_p-1:0] |
M |
The data access size is 2dsize bytes |
data[data_width_p-1:0] |
U |
The data |
iaddr_lsbs[iaddr_lsbs_width_p-1:0] |
O |
LSBs of the data access instruction address. Required if retires_p > 1 |
dblock[dblock_width_p-1:0] |
O |
Instruction block in which the data access instruction is retired. Required if there are replicated instruction block signals |
lrid[lrid_width_p-1:0] |
S |
Load request ID. Valid when dretire is high |
lresp[lresp_width_p-1:0] |
S |
Load response:: None: reserved: Okay. Load successful; ldata valid: Error. Load failed; ldata not valid |
lid[lrid_width_p-1:0] |
S |
Split Load ID. Valid when lresp is non-zero |
sdata[sdata_width_p-1:0] |
S |
Store data. Valid when dretire is high |
ldata[ldata_width_p-1:0] |
S |
Load data. Valid when lresp is non-zero |
Value | Description |
---|---|
0 |
Load |
1 |
Store |
2 |
reserved |
3 |
reserved |
4 |
CSR read-write |
5 |
CSR read-set |
6 |
CSR read-clear |
7 |
reserved |
8 |
Atomic swap |
9 |
Atomic add |
10 |
Atomic AND |
11 |
Atomic OR |
12 |
Atomic XOR |
13 |
Atomic max |
14 |
Atomic min |
15 |
Conditional store failure |
The maximum value of dtype_width_p is 4. However, if only loads and stores are supported, dtype_width_p can be 1. If CSRs are supported but atomics are not, dtype_width_p can be 3.
Atomic and CSR accesses have either both load and store data, or store data and an operand. For CSRs and unified atomics, both values are reported via data, with the store data in the LSBs and the load data or operand in the MSBs.
lrid_width_p is determined by the maximum number of loads that can be in progress simultaneously, such that at any one time there can be no more than one load in progress with a given ID.
iaddr_lsbs and dblock are provided to support filtering of which data accesses to trace based on their instruction address. This is best illustrated by considering the following instruction sequence:
-
load
-
<some non data access instruction>
-
load
-
<some non data access instruction>
-
<some non data access instruction>
Suppose the hart is capable of retiring up to 4 instructions in a cycle, via a single block. Instruction trace is enabled throughout, but the requirement is to collect data trace for the 1st load (instruction 1), and filtering is configured to match the address of this instruction only. However, information about instruction addresses is passed to the encoder at the block level, and the block boundaries are invisible to the decoder. For instruction trace, all instructions in a block are traced if any of the instructions in that block match the filtering criteria. That is fine for instruction trace - the address of the 1st and last traced instruction are output explicitly. There will be some fuzziness about precisely what those addresses will be depending on where the block boundaries fall, but this is not a concern as everything is always self-consistent.
However, that is not the case for data trace. Consider two scenarios:
-
Case 1: 1st block contains instructions 1, 2, 3; second block contains 4, 5
-
Case 2: 1st block contains instructions 1, 2; second block contains 3, 4, 5
Given that iretire is non-zero in the same cycle that the data access retires, the encoder knows the address of the 1st and last instructions in a block, but does not know precisely where in the block the data access is. In both cases, the first block matches the filtering criteria (it contains the address of instruction 1), and the second block does not. But if the encoder traced all the data accesses in the matching block, then in case 1 it would trace both instructions 1 and 3, whereas in the second case it would trace only instruction 1. The decoder has no visibility of the block boundaries so cannot account for this. It is expecting only instruction 1 to be traced, and so may misinterpret instruction 3. If this code is in a loop for example, it will assume that the 2nd traced load is in fact instruction 1 from the next loop iteration, rather than instruction 3 from this iteration.
Providing the LSBs of the data access instruction address allows the decoder to determine precisely whether the data access should be traced or not, and removes the dependency on the block sizes and boundaries. The number of bits required is one more bit than the number required to index within the block because blocks can start on any half-word boundary.
For harts that replicate the block signals to allow multiple blocks to retire per cycle it is also necessary to indicate which block each data access is associated with, so the encoder knows which block address to combine with the LSBs in order to construct the actual data access instruction address. 1 bit for 2 blocks per cycle, 2 bits for 4, and so on.