-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alternative implementation of device timer to SyclTimer class #1872
base: master
Are you sure you want to change the base?
Conversation
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1872/index.html |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_149 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_150 ran successfully. |
`device_timer` keyword argument controls the type of tasks submitted. | ||
With `device_timer="queue_barrier"`, queue barrier tasks are used. With | ||
`device_timer="order_manager"`, a single empty body task is inserted | ||
instead relying on order manager (used by `dpctl.tensor` operations) to | ||
order these tasks so that they fence operations performed within | ||
timer's context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be nice to add some of the details from this PR description, i.e.,
"Tasks will follow an order of [prior_tasks] -> [fence_start_task] -> [compute_tasks] -> [fence_end_task] -> [subsequent_tasks]
for some prior_tasks
started before timing, compute_tasks
which are being timed, subsequent_tasks
which are performed after timing, and fence_start_task
and fence_end_task
used to find the delta time."
I think this is pretty insightful on its own, though it may belong in documentation or examples rather than this docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be good to specify the circumstances under which users will want to use order_manager
vs. queue_barrier
, and mention the improved accuracy of order_manager
implementation.
SyclTimer now supports device_timer keyword argument, a legacy behavior "queue_barrier", and new one based on sequential order manager, which inserts an empty task into the manager to record start and end of block of timed code. Docstring of SyclTimer updated. All data attributes needed for functioning of the timer are created during class instance construction now.
Check different device_timer values, test argument validation, and test cumulative timing.
52e211c
to
9322201
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_174 ran successfully. |
This PR adds Python API to submit empty body single task to a queue.
dpctl.SyclTimer
is modified to acquiredevice_timer
keyword argument with supported values being"queue_barrier"
(legacy behavior, a default), and"order_manager"
.With
"order_manager"
, timer submits the empty body single tasks (fence tasks) to the queue, using order manager to order them so as to fence timed submissions. For example, execution of the following snippet:results in a task graph
[prior_tasks] -> [fence_start_task] -> [ compute_tasks] -> [fence_end_task] -> [subsequent_tasks]
.Timer uses profiling data from events associated with fence tasks to estimate execution time of compute tasks as measured by the device's timer.
The
device_timer="order_manager"
is useful to timedpctl.tensor
operations which leverage order manager.