-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xclbin to ELF flow migration #8581
base: master
Are you sure you want to change the base?
Conversation
ef27174
to
8539462
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I am not completely done reviewing, but maybe address my first points and I will review again.
xrt::xclbin::kernel | ||
get_kernel(const xrt::hw_context& hwctx); | ||
|
||
xrt::module | ||
get_module(const xrt::hw_context& hwctx, const std::string& kname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments. In the context of the changes in this PR, it is not clear to me that xrt::xclbin::kernel makes any sense? If a hwctx is created from an elf, as the example in the description shows, then there is no xclbin so what does this function return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Soren, get_kernel function was mistakenly added and I forgot to remove it.
Also I will document the functions which are newly added. Thanks
xrt::module | ||
get_module(const std::string& name); | ||
|
||
std::string | ||
get_kernel_signature(const xrt::module& module); | ||
|
||
std::string | ||
get_kernel_name(const xrt::module& module); | ||
|
||
// Get partition size if ELF has info | ||
uint32_t | ||
get_partition_size(const xrt::module& module); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment the new functions.
m_partition_size = xrt_core::module_int::get_partition_size(m_module_map.begin()->second); | ||
m_hdl = m_core_device->create_hw_context(m_partition_size, m_cfg_param, mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these lines be moved up before the local module is moved into the map? That would get rid of the begin()->second
which doesn't read too well.
, m_mode{mode} | ||
{ | ||
auto module = xrt::module(elf); | ||
auto kernel_name = xrt_core::module_int::get_kernel_name(module); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how the module knows the kernel name as opposed to the elf? What is a kernel name in this case? Is there only one kernel name in an elf? This code may be fine, but comments are needed to answer my questions :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the Elf parsing is done in xrt_module.cpp and this was our design from the beginning even before this change.
So whenever we create xrt::module object Elf is parsed and info is stored in module object for future retrieval.
Yes as per spec there will be only one kernel per Elf but there can be multiple control codes.
I will add more comments in the code where ever possible.
if (m_partition_size != part_size) | ||
throw std::runtime_error("can not add config to ctx with different configuration\n"); | ||
|
||
for (const auto& m : m_module_map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confuses me. The key for m_module_map was created from xrt_core::module_int::get_kernel_name, so I would infer that the 'key' (kernel_name) that matches the kernel name of the 'value' (module) would be that module exactly? So why not a simple m_module.find(kname) and return the module if not end iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Soren you are right, I will modify the code
return tokens; | ||
} | ||
|
||
void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void | |
static void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant make it static Soren as it uses member variables (args vector in this case)
} | ||
|
||
void | ||
construct_elf_kernel_args(const std::string& kernel_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting rather complicated. A lot of xclbin complexity was hidden in xrt::xclbin to avoid this kind of parsing in different places. I am not completely understanding what is going on here, but the point of xrt::elf was to mirror xrt::xclbin from an API point of view. Would that be possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Soren, I can move this functionality to xrt_module.cpp as that file is used for elf parsing
kernel_properties::mailbox_type | ||
get_mailbox_from_ini(const std::string& kname) | ||
{ | ||
static auto mailbox_kernels = xrt_core::config::get_mailbox_kernels(); | ||
return (mailbox_kernels.find("/" + kname + "/") != std::string::npos) | ||
? xrt_core::xclbin::kernel_properties::mailbox_type::inout | ||
: xrt_core::xclbin::kernel_properties::mailbox_type::none; | ||
} | ||
|
||
// Kernel auto restart counter offset | ||
// Needed until meta-data support (Vitis-1147) | ||
kernel_properties::restart_type | ||
get_restart_from_ini(const std::string& kname) | ||
{ | ||
static auto restart_kernels = xrt_core::config::get_auto_restart_kernels(); | ||
return (restart_kernels.find("/" + kname + "/") != std::string::npos) | ||
? 1 | ||
: 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not. Instead of rebuilding the properties, please cache a pointer to the existing properties that have all the information
bool | ||
get_sw_reset_from_ini(const std::string& kname) | ||
{ | ||
static auto reset_kernels = xrt_core::config::get_sw_reset_kernels(); | ||
return (reset_kernels.find("/" + kname + "/") != std::string::npos); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
kernel_properties::mailbox_type | ||
get_mailbox_from_ini(const std::string& kname); | ||
|
||
kernel_properties::restart_type | ||
get_restart_from_ini(const std::string& kname); | ||
|
||
bool | ||
get_sw_reset_from_ini(const std::string& kname); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets avoid.
@hlaccabu recently added some code to enable elf flow in xrt-smi validate for npu3. Please work with him to make sure that this change doesn't breaks the existing code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to go from driver perspective. As long as driver's xrt submodule (and corresponding xrt_plugin.deb) contains these changes as well as the XRT .deb.
@hlaccabu can you please check if your feature doesn't break with these changes |
Yep you're all set, the xrt-smi tests that I enabled for npu3 still run |
Signed-off-by: rbramand <[email protected]>
Signed-off-by: rbramand <[email protected]>
8539462
to
d199a7f
Compare
Signed-off-by: rbramand <[email protected]>
Problem solved by the commit
This PR enables new flow where ELF file is input instead of xclbin.
spec - https://confluence.amd.com/display/AIE/AIE+Compiler+Artifacts#AIECompilerArtifacts-Flow2
spec of new config elf - https://confluence.amd.com/pages/viewpage.action?pageId=1485002156#Profile/Config/CtrlcodeElfSpec-Config.elf
This new Elf has partition size and kernel signature.
It has multiple .ctrltext and .ctrldata sections and multiple .pdi sections.
Each ctrltext section represents one control code and pdi addresses along with kernel args needs to be patched into the control code buffer.
Control code that needs to be run is decided while creating xrt::kernel object using kernel name - <kernel_name>:<ctrl_code_id> eg: "DPU:0" runs with control code from section .ctrltext0 and "DPU:1" runs with control code from section .ctrlcode1
Sample test case :
XRT first class object changes :
xrt::hw_context - Added new APIs to create context using Elf instead of xclbin
xrt::kernel - Added new ext API to create kernel object using kernel name, added changes to construct kernel args from kernel signature.
xrt::elf - can now take new Elf with os abi 70 as input
xrt::module - added changes to parse this new Elf, also refactored code
Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered
This is a new feature
How problem was solved, alternative solutions (if any) and why they were rejected
Hw context can be created either with xclbin (traditional flow) or with xrt::elf. A new constructor is provided for same.
A new xrt::ext::kernel constructor is provided that takes ctx and kernel name as input.
Added code to parse new Elf file and patch it according to spec.
Rest of the flow remains same
Risks (if any) associated the changes in the commit
Low to medium
Tested the existing flows but needs more testing with all the available test cases.
What has been tested and how, request additional testing if necessary
Tested with new application flow on aie2p simnow (needs changes with respect to flow in amdxdna shim and firmware, changes are yet to be merged, so tested with local drops)
Tested existing test cases on phoenix hw (linux) and tests passes so existing flow didn't break.
TODO : check whether existing aie2ps test cases work
Documentation impact (if any)
Added doxygen comments in code for new APIs added, may be we need to document about new flow after it is stabilized.