-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function signature based labeling scheme for landing pad #434
base: cfi-prop
Are you sure you want to change the base?
Conversation
Regarding the mechanism to pass the calculated lpad labels to static linker for PLT generation, maybe we can just add a symbol for each called function? Currently, the LLVM KCFI mechanism adds similar symbols to allow assembly codes to reference labels computed from C function declarations. This approach has the benefit of being easily human readable when examining the assembly text and object dumps (through symbol table), and it does not require us to invent a new format for this purpose, which means it can already be accepted by existing assemblers and compiling pipelines that utilizes independent assemblers. The downside of this may be that it adds quite a lot of additional entries to the symbol table and the data structure of symbol table entries are a bit too bloated for this purpose, but if we decide to pass the symbols along in the relocatables instead of fetching from shared objects at static link time, maybe we can just advise programmers to strip away the symbol tables after linking so the program size can be reduced. |
@mylai-mtk thanks for the inputs, and share few options in my mind:
typedef struct {
ElfNN_Half ss_boundto; /* Direct bindings, symbol bound to */
ElfNN_Word ss_sig; /* Signature string, string index in .riscv.ssstr section . */
} ElfNN_SymSig; However option 1 and option 2 will has some problem when dealing with undefined weak symbol For options 3: build attribute for symbol is kinda good fit for current usage, but...all linker seems NOT implement that at all, so I'm a little hesitant to choose this option, also it's bind to symbol, so it could handle undefined weak symbol well in theory. For option 4: similar to option 3 for many aspect, but use a customized section.
|
Hi @kito-cheng, thanks for the reply.
Regarding this option, I don't see any structure that looks like the "build attribute" you mentioned that works with symbols in the "Attributes" section of the linked document. Can you be more specific on what existing format we have that you're referring to?
I don't think we need LTO with KCFI. KCFI works much like Zicfilp, except that it's software emulated. In KCFI, the label used at the caller site comes from the called target's apparent signature as seen from caller, which should be available in C programs for most of the time. The only exception I know is K&R-style functions, which technically do not have signatures and accept a plethora of argument type combinations. However, intended K&R-style functions are rare IMHO, and probably should use a special label rule if we are to handle them.
I wasn't talking about generating symbol at the same address as the function symbol. In KCFI, the label symbols are associated to the called functions by name, e.g. for a What I proposed was to add something like |
We need to have a way (define macro?) to know which labeling scheme (simple/complex) is in use for the current compilation, so assembly files can know which label to use. This is needed for libc implementation. |
(Nitpicking...) I propose that we move away from the name of "complex" when using function signatures as the label content. Though we have a "simple" labeling scheme and it's natural to use its opposite term "complex" to name our new scheme, which is not simple, I think the term "complex" covers too many possibilities and thus reveals too little about what it actually is. Also, avoiding the "complex" term allows us to define more label schemes in the future without the embarrassment of feeling like defining another new "complex" scheme. Based on these points, I propose to name this current new label scheme using the name "func_sig" or "mangled_sig" (the underscore may be removed), which is precise and tells what it really does. |
Good suggestion, let me rename it to function signature / func_sig :) |
Create a PR for adding macro riscv-non-isa/riscv-c-api-doc#76 |
e4ae5de
to
c3c4adc
Compare
Changes:
|
1: lpad <hash-value-for-function> | ||
auipc t3, %pcrel_hi([email protected]) | ||
l[w|d] t3, %pcrel_lo(1b)(t3) | ||
lui t2, <hash-value-for-function> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its needed for direct call to PLT followed by indirect tail call from PLT to target...
Fix spelling aupic -> auipc
Signed-off-by: XYenChi <[email protected]>
Changes:
|
riscv-elf.adoc
Outdated
|
||
The label value is derived from the lower 20 bits of the MD5 hash result of the | ||
function signature string. If the lower 20 bits are all zeros, the higher 20 | ||
bits are used. If all 32 bits are zeros, the lower 20 bits of the MD5 hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD5 results in a 128-bit number, so I guess you mean 'If all 128 bits are zero' here.
But since MD5 gives a 128-bit number, would you consider taking other parts of the number if both hi20(MD5)
and low20(MD5)
are zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, let me update the rule, I just thought MD5 is 32 bit, but that should be 128 bits...
riscv-elf.adoc
Outdated
If less than 20 bits are available in the final segment, the highest 20 bits of | ||
the MD5 hash result will be used. If all 128 bits are zeros, the lower 20 bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind if we use the lowest 8 bits (and zero-extend it to 20 bits) in the final segment in case the lowest 120 bits are all zero? This saves an additional 12-bit logical left shift (and some book keeping) if the following algorithm is used to implement this paragraph:
uint128_t MD5 = ...;
while (MD5) {
if (MD5 & 0xFFFFF) return MD5 & 0xFFFFF;
MD5 >>= 20;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this not make much difference, let me update that later :P
fix riscv-abi.adoc format
Define two bit for landing pad and shadow stack, and we plan to defined third bit `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_COMPLEX` for complex labeling scheme.
Changes: - Rename `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_SIMPLE` to `GNU_PROPERTY_RISCV_FEATURE_1_CFI_LP_UNLABELED` - Fix wrong offset in the first PLT stubs for the simple landing pad PLT.
Function signature based labeling scheme, follow the "Function types" mangling rule defeind in Itanium C++ ABI. With few specific rules: - `main` funciton is using signature of `(int, pointer to pointer to char) returning int` (`FiiPPcE`). - `_dl_runtime_resolve` use zero for the landing pad. - {Cpp} member functions should use the "Pointer-to-member types" mangling rule defined in the _Itanium {Cpp} ABI_ <<itanium-cxx-abi>>. - Virtual functions in {Cpp} should use the member function type of the base class that first defined the virtual function. - If a virtual function is inherited from more than one base class, it should use the type of the first base class. Thunk functions will use the type of the corresponding base class. Co-authored-by: Ming-Yi Lai <[email protected]>
Changes: - Rename complex labeling scheme to function signature based labeling scheme - Fix the PLT stubs - Add labeling rule for `main` and `_dl_runtime_resolve`. - Clarify the rule for those virtual function from more than one base class.
- Speical rule for return type of member function. - Speical rule for class destructors - <exception-spec> should be ignored. - Static functions should follow the same rules as normal functions. - wchar_t is platform dependent. - Functions with an empty parameter list are treated as `void` (`v`).
- Add note to mention covariant return types
Use zero-filled value if remain bits is less than 20 bits
c870b64
to
b523e16
Compare
NOTE: it's chained PR, which based on #417
TODO: We don't add mechanism for generating right PLT yet, we may need a specialized section to record the function signature or hash, so that static linker can generate right label at PLT entries.
Function signature based labeling scheme, follow the "Function types" mangling rule defined in Itanium C++ ABI.
With few specific rules:
main
function is using signature of(int, pointer to pointer to char) returning int
(FiiPPcE
)._dl_runtime_resolve
use zero for the landing pad.defined in the Itanium C++ ABI.
class that first defined the virtual function.
use the type of the first base class. Thunk functions will use the type of
the corresponding base class.