Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xclbin to ELF flow migration #8581

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rbramand-xilinx
Copy link
Collaborator

@rbramand-xilinx rbramand-xilinx commented Oct 29, 2024

Problem solved by the commit

This PR enables new flow where ELF file is input instead of xclbin.
spec - https://confluence.amd.com/display/AIE/AIE+Compiler+Artifacts#AIECompilerArtifacts-Flow2

spec of new config elf - https://confluence.amd.com/pages/viewpage.action?pageId=1485002156#Profile/Config/CtrlcodeElfSpec-Config.elf

This new Elf has partition size and kernel signature.
It has multiple .ctrltext and .ctrldata sections and multiple .pdi sections.
Each ctrltext section represents one control code and pdi addresses along with kernel args needs to be patched into the control code buffer.
Control code that needs to be run is decided while creating xrt::kernel object using kernel name - <kernel_name>:<ctrl_code_id> eg: "DPU:0" runs with control code from section .ctrltext0 and "DPU:1" runs with control code from section .ctrlcode1

Sample test case :

#include "xrt/xrt_hw_context.h"
#include "xrt/xrt_kernel.h"
#include "xrt/xrt_elf.h"

int main(int argc, char** argv)
{
    std::string elf_path {argv[1]};   // elf path
    auto elf = xrt::elf(elf_path);

    auto device = xrt::device(0);
    auto ctx = xrt::hw_context(device, elf);

    std::string kernel_name = "DPU:1";
    auto kernel = xrt::ext::kernel(ctx, kernel_name);
    
    //  create args
    auto run = kernel(arg1, arg2 ...);
    run.wait();

    return 0;
}

XRT first class object changes :
xrt::hw_context - Added new APIs to create context using Elf instead of xclbin
xrt::kernel - Added new ext API to create kernel object using kernel name, added changes to construct kernel args from kernel signature.
xrt::elf - can now take new Elf with os abi 70 as input
xrt::module - added changes to parse this new Elf, also refactored code

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

This is a new feature

How problem was solved, alternative solutions (if any) and why they were rejected

Hw context can be created either with xclbin (traditional flow) or with xrt::elf. A new constructor is provided for same.
A new xrt::ext::kernel constructor is provided that takes ctx and kernel name as input.
Added code to parse new Elf file and patch it according to spec.
Rest of the flow remains same

Risks (if any) associated the changes in the commit

Low to medium
Tested the existing flows but needs more testing with all the available test cases.

What has been tested and how, request additional testing if necessary

Tested with new application flow on aie2p simnow (needs changes with respect to flow in amdxdna shim and firmware, changes are yet to be merged, so tested with local drops)

Tested existing test cases on phoenix hw (linux) and tests passes so existing flow didn't break.

TODO : check whether existing aie2ps test cases work

Documentation impact (if any)

Added doxygen comments in code for new APIs added, may be we need to document about new flow after it is stabilized.

@rbramand-xilinx rbramand-xilinx changed the title Enable new XRT test case flow without xclbin Xclbin to ELF flow migration Oct 29, 2024
Copy link
Collaborator

@stsoe stsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I am not completely done reviewing, but maybe address my first points and I will review again.

Comment on lines 47 to 51
xrt::xclbin::kernel
get_kernel(const xrt::hw_context& hwctx);

xrt::module
get_module(const xrt::hw_context& hwctx, const std::string& kname);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments. In the context of the changes in this PR, it is not clear to me that xrt::xclbin::kernel makes any sense? If a hwctx is created from an elf, as the example in the description shows, then there is no xclbin so what does this function return?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Soren, get_kernel function was mistakenly added and I forgot to remove it.
Also I will document the functions which are newly added. Thanks

Comment on lines 60 to 74
xrt::module
get_module(const std::string& name);

std::string
get_kernel_signature(const xrt::module& module);

std::string
get_kernel_name(const xrt::module& module);

// Get partition size if ELF has info
uint32_t
get_partition_size(const xrt::module& module);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment the new functions.

m_partition_size = xrt_core::module_int::get_partition_size(m_module_map.begin()->second);
m_hdl = m_core_device->create_hw_context(m_partition_size, m_cfg_param, mode);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these lines be moved up before the local module is moved into the map? That would get rid of the begin()->second which doesn't read too well.

, m_mode{mode}
{
auto module = xrt::module(elf);
auto kernel_name = xrt_core::module_int::get_kernel_name(module);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how the module knows the kernel name as opposed to the elf? What is a kernel name in this case? Is there only one kernel name in an elf? This code may be fine, but comments are needed to answer my questions :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the Elf parsing is done in xrt_module.cpp and this was our design from the beginning even before this change.
So whenever we create xrt::module object Elf is parsed and info is stored in module object for future retrieval.
Yes as per spec there will be only one kernel per Elf but there can be multiple control codes.
I will add more comments in the code where ever possible.

if (m_partition_size != part_size)
throw std::runtime_error("can not add config to ctx with different configuration\n");

for (const auto& m : m_module_map) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me. The key for m_module_map was created from xrt_core::module_int::get_kernel_name, so I would infer that the 'key' (kernel_name) that matches the kernel name of the 'value' (module) would be that module exactly? So why not a simple m_module.find(kname) and return the module if not end iterator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Soren you are right, I will modify the code

return tokens;
}

void
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void
static void

Copy link
Collaborator Author

@rbramand-xilinx rbramand-xilinx Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant make it static Soren as it uses member variables (args vector in this case)

}

void
construct_elf_kernel_args(const std::string& kernel_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting rather complicated. A lot of xclbin complexity was hidden in xrt::xclbin to avoid this kind of parsing in different places. I am not completely understanding what is going on here, but the point of xrt::elf was to mirror xrt::xclbin from an API point of view. Would that be possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Soren, I can move this functionality to xrt_module.cpp as that file is used for elf parsing

Comment on lines +962 to +980
kernel_properties::mailbox_type
get_mailbox_from_ini(const std::string& kname)
{
static auto mailbox_kernels = xrt_core::config::get_mailbox_kernels();
return (mailbox_kernels.find("/" + kname + "/") != std::string::npos)
? xrt_core::xclbin::kernel_properties::mailbox_type::inout
: xrt_core::xclbin::kernel_properties::mailbox_type::none;
}

// Kernel auto restart counter offset
// Needed until meta-data support (Vitis-1147)
kernel_properties::restart_type
get_restart_from_ini(const std::string& kname)
{
static auto restart_kernels = xrt_core::config::get_auto_restart_kernels();
return (restart_kernels.find("/" + kname + "/") != std::string::npos)
? 1
: 0;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not. Instead of rebuilding the properties, please cache a pointer to the existing properties that have all the information

Comment on lines +983 to +988
bool
get_sw_reset_from_ini(const std::string& kname)
{
static auto reset_kernels = xrt_core::config::get_sw_reset_kernels();
return (reset_kernels.find("/" + kname + "/") != std::string::npos);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +427 to +435
kernel_properties::mailbox_type
get_mailbox_from_ini(const std::string& kname);

kernel_properties::restart_type
get_restart_from_ini(const std::string& kname);

bool
get_sw_reset_from_ini(const std::string& kname);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets avoid.

@AShivangi
Copy link
Collaborator

@hlaccabu recently added some code to enable elf flow in xrt-smi validate for npu3. Please work with him to make sure that this change doesn't breaks the existing code.

Copy link
Collaborator

@hlaccabu hlaccabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to go from driver perspective. As long as driver's xrt submodule (and corresponding xrt_plugin.deb) contains these changes as well as the XRT .deb.

@rbramand-xilinx
Copy link
Collaborator Author

good to go from driver perspective. As long as driver's xrt submodule (and corresponding xrt_plugin.deb) contains these changes as well as the XRT .deb.

@hlaccabu can you please check if your feature doesn't break with these changes

@hlaccabu
Copy link
Collaborator

good to go from driver perspective. As long as driver's xrt submodule (and corresponding xrt_plugin.deb) contains these changes as well as the XRT .deb.

@hlaccabu can you please check if your feature doesn't break with these changes

Yep you're all set, the xrt-smi tests that I enabled for npu3 still run

rbramand added 2 commits November 1, 2024 17:26
Signed-off-by: rbramand <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants