Run constructor Run wait and run start hook for temporal sharing usecase#9721
Conversation
Call xrt_core::xdp::run_constructor from the xrt::run constructor, passing the run pointer and hw_context_impl pointer. The hook dispatches to aie::profile::run_constructor via dlsym-registered callback (aieProfileRunConstructor), gated on get_aie_profile(). This enables XDP plugins to attach per-run resources at run construction time. Made-with: Cursor
Thread the run_impl::get_uid() value as a uint32_t through the entire hook chain so the per-run CT file is named with the actual run UID (e.g. aie_profile_ctx_1_run_3.ct), matching the dtrace dump naming convention (dtrace_dump_ctx_1_run_3_<timestamp>.py). Made-with: Cursor
Pass kernel name and ELF handle (from xrt::module) through the run_constructor callback chain so the AIE profile plugin can extract SAVE_TIMESTAMP locations directly from the ELF at run construction time. Made-with: Cursor
Expose Debug.aie_dtrace and AIE_dtrace_settings (interval, tile- and graph-based interface tile metrics) in config_reader.h for xrt.ini. Register the xdp_aie_dtrace_plugin_xdna module loader in profile.cpp when the XDP VE2 build supplies that shared library. Bump the src/runtime_src/xdp submodule to pick up the AIE dtrace plugin and related XDP profile changes. Made-with: Cursor
Delete get_aie_profile_settings_dtrace_debug(); AIE dtrace uses Debug.aie_dtrace only (xdp submodule). Made-with: Cursor
Add xrt_core::xdp::run_start and run_wait APIs and load aieDtraceRunStart / aieDtraceRunWait from the dtrace plugin when aie_dtrace is enabled. Call run_start before kernel submit from xrt::run::start, and run_wait after xrt::run::wait / wait2 return. Use const_cast for wait paths because wait is const. Bump src/runtime_src/xdp submodule to include the plugin exports and hooks. Made-with: Cursor
Bump src/runtime_src/xdp to include aie_profile cleanup (remove AieProfileCTWriter, ddr/read/write_bandwidth handling, and generateCTForRun). Made-with: Cursor
Points src/runtime_src/xdp at commit adding METRIC_* bandwidth constants, isInterfaceTileBandwidthMetricSet(), and VE2 AieProfileCTWriter updates. Made-with: Cursor
This reverts commit befc23e.
Made-with: Cursor
Remove aie::profile run_constructor callback registration and the get_aie_profile() branch in xrt_core::xdp::run_constructor. Per-run setup is handled solely by the aie_dtrace plugin when enabled. Submodule src/runtime_src/xdp: AIE profile VE2 CT writer removal and related cleanup (see submodule commit). Made-with: Cursor
Points xdp at commit that derives UC spans from aiebu op_loc and extends the last UC through the maximum configured counter column. Made-with: Cursor
Signed-off-by: Garima Dhaked <garima.dhaked@amd.com>
Guard run_constructor, run_start, and run_wait hooks with XDP_VE2_BUILD preprocessor conditional to exclude this code from non-VE2 builds. These hooks are only needed for AIE dtrace functionality on VE2 XDNA platforms. Changes: - Wrap aie::dtrace namespace in profile.cpp with #ifdef XDP_VE2_BUILD - Wrap run hook functions in xrt_core::xdp namespace with same guard - Add XDP_VE2_BUILD compile definition to core_common_api_library_objects in CMakeLists.txt for VE2 builds - Guard run hook calls in xrt_kernel.cpp (run constructor, start, wait) - Update copyright year in profile.h Made-with: Cursor
Signed-off-by: Garima Dhaked <garima.dhaked@amd.com>
…e check Replace compile-time XDP_VE2_BUILD guards with runtime aie_dtrace config check for run_constructor, run_start, and run_wait hooks. Introduce run_info struct to encapsulate hook arguments and add make_run_info helper for cleaner call sites. Made-with: Cursor
Signed-off-by: Garima Dhaked <garima.dhaked@amd.com>
…e check Replace compile-time XDP_VE2_BUILD guards with runtime aie_dtrace config check for run_constructor, run_start, and run_wait hooks. Introduce run_info struct to encapsulate hook arguments and add make_run_info helper in profile.h for cleaner call sites. Remove XDP_VE2_BUILD compile definition from CMakeLists.txt since hooks are now controlled at runtime via aie_dtrace config. Made-with: Cursor
I, Garima Dhaked <garima.dhaked@amd.com>, hereby add my Signed-off-by to this commit and all preceding commits in this pull request that were authored by me. This remediation covers the following commits by Garima Dhaked: 6d09bc8 xdp: refactor run hooks to use run_info struct with runtime aie_dtrace check ab6e2a7 xdp: refactor run hooks to use run_info struct with runtime aie_dtrace check 9129d2a xdp: Compile run hooks only for VE2 XDNA builds 4e88252 Update xdp submodule for aie_dtrace CT writer UC column handling a2fa7ca xdp: route run_constructor to AIE dtrace only 1fcd5b3 xdp: bump submodule (revert a0d7280 CT writer / bandwidth drop) 339e681 Revert "xdp: bump submodule for aie_profile CT bandwidth metric cleanup" befc23e xdp: bump submodule for aie_profile CT bandwidth metric cleanup 50b4d53 xdp: aie_profile drops CT writer and bandwidth metrics 02335bc Wire AIE dtrace run start and run wait hooks from xrt::run 5572676 Remove AIE_profile_settings.dtrace_debug config reader 9c455aa XRT: Wire AIE dtrace config and load xdp_aie_dtrace_plugin_xdna Signed-off-by: Garima Dhaked <garima.dhaked@amd.com> I, Jyotheeswar Ganne <Jyotheeswar.Ganne@amd.com>, hereby add my Signed-off-by to this commit and all preceding commits in this pull request that were authored by me. This remediation covers the following commits by Jyotheeswar Ganne: 21b8c68 Thread kernel_name and elf_handle through run_constructor XDP hook ae60a20 Pass run UID through run_constructor hook for CT filename b441e2b Add run_constructor XDP hook in xrt::run for per-run plugin callbacks Signed-off-by: Jyotheeswar Ganne <Jyotheeswar.Ganne@amd.com> Made-with: Cursor
Refactor the dtrace run hooks to simplify call sites in xrt_kernel.cpp by moving all run info extraction logic into profile.cpp. Changes: - Add run-level accessor functions to kernel_int.h (get_run_uid, get_run_hwctx, get_run_kernel_name, get_run_module, get_run_state) - Simplify run_constructor, run_start, and run_wait to take xrt::run directly instead of run_info struct - Remove run_info struct and make_run_info helper from profile.h - Move config check and info extraction into profile.cpp Call sites in xrt_kernel.cpp are now simplified to: xrt_core::xdp::run_constructor(*this); xrt_core::xdp::run_start(*this); xrt_core::xdp::run_wait(*this); Signed-off-by: Garima Dhaked <garima.dhaked@amd.com> Made-with: Cursor
stsoe
left a comment
There was a problem hiding this comment.
Thanks, looks better. But please don't change the function bodies of xrt::run APIs, please move your code to the implementation (alloc_run, run_imp::start(), and so forth.).
stsoe
left a comment
There was a problem hiding this comment.
I am still suggesting changes here.
I would like xdp/profile.h, which is included in xrt_kernel.cpp, to define a struct like this
namespace <what-ever-your-namespace-is> {
struct xrt_kernel_data
{
uint32_t uid;
std::string name;
xrt::hw_context hwctx;
xrt::module mod;
};
}
This struct is visible to xrt_kernel.cpp because it includes profile.h
In kernel_int.h, forward declare this struct and declare a function to fill it:
namespace <what-ever-your-namespace-is> {
struct xrt_kernel_data; // forward decl
}
namespace xrt_core::kernel_int {
...
void
get_xdp_kernel_data(xrt_kernel_data*);
}
In xrt_kernel.cpp, define the function:
namespace xrt_cpre::kernel_int {
...
void
get_xdp_kernel_data(xrt_kernel_data* data)
{
data->uid = ...;
data->name = ...;
data->hwctx = ...;
data->mod = ...;
}
This accomplishes the same as you have now, but creates only one new internal function, which by its name is a little more specific. I realize that this may be very close to the make_info() function you had earlier, but this data exchange is simpler and much cleaner.
We still need the API level functions cleaned up and moved appropriately into impl functions.
Move XDP profiling hooks for run_start and run_wait from the public xrt::run API level to the internal run_impl implementation level. This ensures these hooks are called regardless of whether the C++ API or C API is used. The run_constructor hook remains at the xrt::run API level because the XDP plugin needs to call methods on the xrt::run object (specifically set_dtrace_control_file). Changes: - Add xrt_kernel_data struct with ert_state field for run state - Add implementation-level XDP hooks: - run_start(const xrt_kernel_data&) - run_wait(const xrt_kernel_data&) - Add get_xdp_kernel_data() helper to run_impl class - Call run_start and run_wait hooks from run_impl methods - Keep run_constructor at xrt::run level (requires xrt::run object) - Remove unused run_wait(const xrt::run&) overload and get_run_state() This enables XDP profiling for run_start and run_wait when using the C API (xrtRunStart, xrtRunWait) which calls run_impl methods directly. Signed-off-by: Garima Dhaked <garima.dhaked@amd.com> Made-with: Cursor
4a10857 to
c50e528
Compare
garimadhaked
left a comment
There was a problem hiding this comment.
I have modified the code as per suggestion did move the hooks to implementation. But for run constructor hook I could not move because the set_dtrace_control_file() is xrt::run obejct method.
Refactor XDP profiling hooks to operate at the handle level (run_impl*)
rather than requiring xrt::run objects. This provides consistency with
how other XDP hooks work in the codebase.
4d1d6e5 to
560d837
Compare
Signed-off-by: Garima Dhaked <garima.dhaked@amd.com>
|
Updated xdp submodule to pick latest changes. PR can be merged now. |
stsoe
left a comment
There was a problem hiding this comment.
Still not quite there, some relative minor adjustments.
Refactor the XDP profiling hooks (run_constructor, run_start, run_wait) to accept const xrt::run_impl* instead of xrt_kernel_data struct. This simplifies the API and moves data extraction into the callback functions themselves, avoiding unnecessary work when callbacks are not registered. Changes: - Update profile.h/cpp: Change hook signatures to take run_impl* - Update kernel_int.h: Replace xrt::run overload with run_impl* version - Update xrt_kernel.cpp: Simplify call sites to pass 'this' directly - Add get_xdp_kernel_data(run_impl*) to extract kernel data internally - Populate ert_state in xrt_kernel_data via run_impl->state() The aie::dtrace namespace functions now check if callbacks are registered before extracting kernel data, improving efficiency when profiling is disabled. Signed-off-by: Garima Dhaked <garima.dhaked@amd.com> Made-with: Cursor
9a1ce34 to
ac1b950
Compare
Remove the unused private get_xdp_kernel_data() member function from run_impl class. This function is no longer needed since XDP profiling hooks now use the exported xrt_core::kernel_int::get_xdp_kernel_data() function that takes a run_impl pointer directly. Signed-off-by: Garima Dhaked <garima.dhaked@amd.com> Made-with: Cursor
ac1b950 to
581a810
Compare
Remove unnecessary header includes that are not used in kernel_int.h: - xrt_module.h: not referenced in the header - xrt_hw_context.h: already included transitively via xrt_kernel.h - ert.h: not referenced in the header Signed-off-by: Garima Dhaked <garima.dhaked@amd.com> Made-with: Cursor
stsoe
left a comment
There was a problem hiding this comment.
Looks great now. Thanks!
One nit, please remove the TRACE call from kernel_int::set_dtrace...().
Signed-off-by: Garima Dhaked <garima.dhaked@amd.com>
Fix : Added XDP dtrace plugin support for temporal sharing usecase.
Testing: Verified few examples with and without XDP dtrace plugin. Verified temporal sharing example as well.