Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Selection] Adding sycl profiling for auto tune policy #1464

Merged
merged 31 commits into from
Apr 30, 2024

Conversation

AnuyaWelling2801
Copy link
Contributor

@AnuyaWelling2801 AnuyaWelling2801 commented Mar 22, 2024

Changes made :

  1. Adding an async_waiter_base to ensure type erasure for the selection type and enable lazy reporting
  2. Adding a choice to use use_event_profiling = q.template has_propertysycl::property::queue::enable_profiling()
  3. Using lazy report to report time instead of host task for auto_tune policy
  4. Added more tests in auto_tune_policy_sycl for devices with event profiling enabled.
  5. Changing the report method for auto_tune policy

@AnuyaWelling2801 AnuyaWelling2801 marked this pull request as draft March 22, 2024 17:22
Copy link
Contributor

@vossmjp vossmjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to take a closer look at thread safety and also find a non-trait-based approach to checking for lazy_report.

@@ -229,6 +229,9 @@ class auto_tune_policy
}
else
{
if constexpr(lazy_report_v<Backend>){
backend_->lazy_report();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are holding a lock on state_->m_ when calling the lazy_report function. Seems unnecessary and perhaps dangerous as a general pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -142,6 +195,15 @@ class sycl_backend
return global_rank_;
}

void lazy_report(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not thread safe. You might have concurrent accesses to the async_waiter array that change its size -- calls to lazy_report and submit (which calls add_waiter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Created a new structure async_wait_arr.

std::vector<async_waiter_base*> async_waiter_arr;

template<typename T>
void add_waiter(T *t){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you protecting this? Is it thread safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

});
});
return async_waiter{e2};
return async_waiter{e2, new SelectionHandle(s)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we don't currently have a policy that requires reporting of both task completion and time, but this implementation would not support it because of this return. If we assumed they were mutually exclusive, the next condition would be an "else if", but its just an "if" so that both might be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -299,6 +311,14 @@ struct report_value
template <typename S, typename Info>
inline constexpr bool report_value_v = report_value<S, Info>::value;

template <typename S>
struct lazy_report_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the first trait for a backend -- i.e. that a backend requires a call to lazy report. Backends are not as visible as policies so likely don't want a trait in this header. End users cannot access the backend, only the policy implementation can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to check if the backend has certain functions without traits. Created a new namespace called backend_traits for checking presence of lazy_report and has_enable_profiling in the backend from the policy

@AnuyaWelling2801 AnuyaWelling2801 added this to the 2022.6.0 milestone Mar 25, 2024
@AnuyaWelling2801 AnuyaWelling2801 changed the title Adding sycl profiling for auto tune policy [Dynamic Selection] Adding sycl profiling for auto tune policy Mar 26, 2024
@AnuyaWelling2801 AnuyaWelling2801 marked this pull request as ready for review March 27, 2024 16:25
SergeyKopienko

This comment was marked as resolved.

@SergeyKopienko

This comment was marked as resolved.

SergeyKopienko

This comment was marked as outdated.

AnuyaWelling2801 and others added 2 commits April 26, 2024 10:29
* test/parallel_api/dynamic_selection/sycl/test_auto_tune_policy_sycl.pass.cpp - fix USM shared memory leaks

Signed-off-by: Sergey Kopienko <[email protected]>

* test/parallel_api/dynamic_selection/sycl/test_auto_tune_policy_sycl.pass.cpp - create unique task names for h.single_task([](){});

Signed-off-by: Sergey Kopienko <[email protected]>

---------

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko

This comment was marked as resolved.

SergeyKopienko and others added 2 commits April 26, 2024 11:36
…aram (#1539)

* include/oneapi/dpl/internal/dynamic_selection_traits.h - fix traits to specify report method second parameter type

Signed-off-by: Sergey Kopienko <[email protected]>

* Change second parameter type of auto_tune_selection_type::report method - to report_duration (std::chrono::milliseconds)

Signed-off-by: Sergey Kopienko <[email protected]>

---------

Signed-off-by: Sergey Kopienko <[email protected]>
@@ -150,6 +150,10 @@ struct dynamic_load_policy
selection_type
select(Args&&...)
{
if constexpr (backend_traits::lazy_report_v<Backend>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnuyaWelling2801 Shouldn't this file need to include the backend_traits.h? Is it somehow including indirectly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend_traits.h file is being included indirectly since the tests include the file oneapi/dpl/dynamic_selection. This file includes auto_tune_policy,h (which contains the file backend_traits.h) before the file dynamic_load_policy.h

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@akukanov akukanov dismissed their stale review April 29, 2024 14:15

The critical issues have been addressed

SergeyKopienko and others added 3 commits April 30, 2024 10:30
…... args) function (#1547)

* include/oneapi/dpl/internal/dynamic_selection_impl/sycl_backend.h - simplify sycl_backend::submit(SelectionHandle s, Function&& f, Args&&... args) function

Signed-off-by: Sergey Kopienko <[email protected]>

* include/oneapi/dpl/internal/dynamic_selection_impl/sycl_backend.h - remove not required type name

Signed-off-by: Sergey Kopienko <[email protected]>

---------

Signed-off-by: Sergey Kopienko <[email protected]>
{
std::chrono::steady_clock::time_point t0;
if constexpr (report_value_v<SelectionHandle, execution_info::task_time_t>)
const auto t0 = report_clock_type::now();
Copy link
Contributor

@akukanov akukanov Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree with the change Sergey proposed to remove if constexpr around taking the initial timing snapshot. It makes the code concise, but is that better in this case? Performance-wise it might be OK if we assume the overhead of std::chrono::steady_clock::now() is negligible, but it provokes questions like "why are the conditions to set and to use the variable different, including even constexpr checks; is it by purpose?".

Maybe using the ternary operator ?: is a good compromise here; it is not constexpr but hopefully the compiler can optimize it anyway. Though if I am the only one concerned and others are not, then it's fine to keep the code as now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok as is. Let's keep as is. We can always revisit it if we see some performance impact later.

Copy link
Contributor

@vossmjp vossmjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All issues are resolved, So looks good to me now.

{
std::chrono::steady_clock::time_point t0;
if constexpr (report_value_v<SelectionHandle, execution_info::task_time_t>)
const auto t0 = report_clock_type::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok as is. Let's keep as is. We can always revisit it if we see some performance impact later.

@AnuyaWelling2801 AnuyaWelling2801 merged commit 5ea755c into main Apr 30, 2024
19 of 20 checks passed
@AnuyaWelling2801 AnuyaWelling2801 deleted the dev/awelling/auto_tune_sycl_profiling branch April 30, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants