-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distributed ranges as experimental feature. #1479
base: main
Are you sure you want to change the base?
Conversation
…o dist-ranges_cleanup
* cleanups * cleanups of include paths * cleanups of include paths and error messages
…synch, + comments
…trait for some oneDPL types
This reverts commit cb7259d.
…zation for some oneDPL types.
… zip_forward_iteratoris not used in the device code
Co-authored-by: Łukasz Ślusarczyk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left the comments and questions regarding to for_each
implementation (sp/algorithms/for_each.hpp
)
include/oneapi/dpl/internal/distributed_ranges_impl/sp/algorithms/for_each.hpp
Outdated
Show resolved
Hide resolved
namespace oneapi::dpl::experimental::dr::sp | ||
{ | ||
|
||
template <typename ExecutionPolicy, distributed_range R, typename Fn> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
What kinds of execution policy are supported? (by design and by this implementation)
As far the C++20 concepts (distributed_range
at least) are already used, it makes sense to add constrains on an execution policy as well.
In oneDPL, there isoneapi::dpl::is_execution_policy_v
for example. -
Does it make sense to add a constrain on
Fn
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The first line in the method is static_assert that the only supported execution policy is
static_assert( // currently only one policy supported
std::is_same_v<std::remove_cvref_t<ExecutionPolicy>, sycl_device_collection>);
IMO this is enough and simple
- I don't think so. There are no constrains for functor in std::for_each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Having "concepts" for some parameter(s) in a signature and having "static_assert" for the other parameter(s) - not consistent approach. Or we use type requirements (concepts and contains) in a algo signature or don't use at all. As far as we use C++20, I would vote for "concepts and constrains" usage. A user of oneDPL usually sees into the documentation where a signature there is.
- But there is in std::ranges::for_each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Originally posted my comment about par_unseq
here, but have moved it below.)
We basically only have one type of device policy, which is a sycl_device_collection
. The standard library doesn't have any constraints for ExecutionPolicy
, which is why we have things as they are now.
I'd be fine with adding a is_execution_policy_v
requirement or execution_policy
concept.
include/oneapi/dpl/internal/distributed_ranges_impl/sp/algorithms/for_each.hpp
Show resolved
Hide resolved
|
||
for (auto&& segment : ranges::segments(r)) | ||
{ | ||
auto&& q = __detail::queue(ranges::rank(segment)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand just sycl device policy is supported for current implementtaion..
So, in the future, what will be queue in case of a host policy, std::execution::par,
for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a separate code without queues and the code is selected by constexpr. We will see. For now there is no need to design the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, probably it makes sense to add a constrain on an execution policy, like
is_sycl_device_collection_v
?
auto first = stdrng::begin(local_segment); | ||
|
||
auto event = dr::__detail::parallel_for(q, sycl::range<>(stdrng::distance(local_segment)), | ||
[=](auto idx) { fn(*(first + idx)); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't insist... but first[idx]
instead of *(first + idx)
more readable and shorter, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank's for spotting this, applied in #1758 here and in a few other places
include/oneapi/dpl/internal/distributed_ranges_impl/sp/algorithms/for_each.hpp
Show resolved
Hide resolved
__detail::wait(events); | ||
} | ||
|
||
template <typename ExecutionPolicy, distributed_iterator Iter, typename Fn> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment (see above) regarding ExecutionPolicy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same answer, for now static_assert is enough I think
|
||
template <distributed_range R, typename Fn> | ||
void | ||
for_each(R&& r, Fn fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- That re-call of
for_each
w/o policy tofor_each(par_unseq)
is done by design? Is the some explanation and statement in the documentation? - According to the static_assert (line 34) above
par_unseq
is not currently supported... - Taking into account 1) and 2) does whether it make sense to release non-policy
for_each
versions at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These algorithms are restricted by distributed range/iterator concepts. It's a question if we want to keep it this way in the long run, but let's keep the approach as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This par_unseq is from our namespace which is
inline sycl_device_collection par_unseq;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline sycl_device_collection par_unseq;
It totally confuses any C++ developers which are aware of C+17 execution policies.
(and it is a potential name conflict...)
I would suggest renaming one in something proper name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what's problematic about using the name par_unseq
as a catch-all execution policy to execute across multiple devices?
From my understanding of the spec, par_unseq
requires:
- Execution in "unordered fashion" in "unspecified threads of execution," and
- These threads must provide "weakly parallel forward progress guarantees."
From what I understand, threads in SYCL have weakly parallel forward progress guarantees. Is it the "threads of execution" part that would be violated?
Other libraries like Nvidia's stdpar use par_unseq
to mean "execute in parallel, including on GPUs if that's where the data is." That's what we're doing here, and it seems like a reasonable choice to me. Is there a convention in oneDPL that par_unseq
is CPU only, or is there another good reason to not use par_unseq
as the "sane default" that includes potentially running on a GPU if that's where the data is resident?
std::size_t idx = 0; | ||
for (auto&& segs : zipped_segments) | ||
{ | ||
auto&& [in_segment, out_segment] = segs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this line can be moved within the if
scope. Or perhaps this loop can be rewritten to have a if (idx == 0) continue;
at the top and remove the following if
entirely.
h.depends_on(event); | ||
h.single_task([=]() { | ||
stdrng::range_value_t<O> value = *src_iter; | ||
*dst_iter = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the explicit temporary here by rewriting as follows:
*dst_iter = value; | |
*dst_iter = *src_iter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these changes should be fine (#1766), but need to be tested on PVC to double-check before merging.
@lslusarczyk could you check if things look okay on your end?
init(R&& devices) requires(std::is_same_v<sycl::device, std::remove_cvref_t<stdrng::range_value_t<R>>>) | ||
{ | ||
__detail::devices_.assign(stdrng::begin(devices), stdrng::end(devices)); | ||
__detail::global_context_ = new sycl::context(__detail::devices_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, there is nothing preventing a user from calling this init
function multiple times, which would result in a memory leak here. Would it help if the context was stored in a shared_ptr
or unique_ptr
? Or is there something else that I am missing?
include/oneapi/dpl/internal/distributed_ranges_impl/sp/algorithms/exclusive_scan.hpp
Show resolved
Hide resolved
include/oneapi/dpl/internal/distributed_ranges_impl/sp/algorithms/reduce.hpp
Show resolved
Hide resolved
include/oneapi/dpl/internal/distributed_ranges_impl/sp/algorithms/reduce.hpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests are a good start but could be expanded to increase the test coverage. E.g.:
- Testing with 10 elements only seems limited for what DR is designed for. The other oneDPL tests use loops with increasing counts.
- It looks like we test with
int
only. Different types and potentially a custom type might identify some gaps. - It looks like we test with
std::plus
and the custommax
operator only.
|
||
TYPED_TEST_SUITE(Reduce, AllTypes); | ||
|
||
TYPED_TEST(Reduce, Range) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called RangeMax
instead of the test below?
TYPED_TEST(Reduce, Range) { | ||
Ops1<TypeParam> ops(10); | ||
|
||
auto max = [](double x, double y) { return std::max(x, y); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use TypeParam
instead of double
?
Ops1<TypeParam> ops(10); | ||
|
||
auto max = [](double x, double y) { return std::max(x, y); }; | ||
EXPECT_EQ(std::reduce(ops.vec.begin(), ops.vec.end(), 3, max), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init
is set to the integer 3 independent of TypeParam
. This could change the result depending on TypeParam
. Is this by design? And do we need tests with init
of type TypeParam
?
xp::reduce(ops.dist_vec, 3, max)); | ||
} | ||
|
||
TYPED_TEST(Reduce, Max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this test be called RangePlus
?
xp::reduce(ops.dist_vec, 3, std::plus{})); | ||
} | ||
|
||
TYPED_TEST(Reduce, Iterators) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this test be called IteratorsPlus
? And why don't we test max
with iterators?
* concepts.hpp moved and included in dpl/distributed_ranges * fixes after merge --------- Co-authored-by: Łukasz Ślusarczyk <[email protected]>
* sp::views::enumerate and __detail::enumarate merged * update * update * update * C++23 enumerate_view used instead of our implementation * cleanup * update * detail/enumerate.hpp should be removed * self applied comments * fix compilation in g++13 --------- Co-authored-by: Łukasz Ślusarczyk <[email protected]>
namespace oneapi::dpl::experimental::dr | ||
{ | ||
|
||
template <typename T> | ||
struct is_ref_view : std::false_type | ||
{ | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the "view detectors" considered a part of public API? If not, better move those to __detail
.
template <typename T> | ||
struct is_iota_view : std::false_type | ||
{ | ||
}; | ||
template <std::weakly_incrementable W> | ||
struct is_iota_view<stdrng::iota_view<W>> : std::true_type | ||
{ | ||
}; | ||
|
||
template <typename T> | ||
inline constexpr bool is_iota_view_v = is_iota_view<T>{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_iota_view
is not used anywhere, including the original DR repository.
#if (defined __cpp_lib_ranges_slide) | ||
|
||
template <typename T> | ||
struct is_sliding_view<stdrng::slide_view<T>> : std::true_type | ||
{ | ||
}; | ||
template <typename T> | ||
inline constexpr bool is_sliding_view_v = is_sliding_view<std::remove_cvref_t<T>>::value; | ||
|
||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not is_sliding_view_v
be moved out of #if...#endif
?
template <typename T> | ||
struct is_zip_view : std::false_type | ||
{ | ||
}; | ||
|
||
#if (defined _cpp_lib_ranges_zip) | ||
template <typename... Views> | ||
struct is_zip_view<stdrng::zip_view<Views...>> : std::true_type | ||
{ | ||
}; | ||
|
||
#endif | ||
template <typename T> | ||
inline constexpr bool is_zip_view_v = is_zip_view<T>::value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_zip_view
is not used anywhere, including the original DR repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While detail/ranges.hpp
suggests that there are some internals of the DR implementation, in fact, this file contains the implementation of public customization points for DR: ranges::rank
and ranges::segments
. So I would suggest to call it appropriately - maybe cpos.hpp
, or split into rank.hpp
and segments.hpp
. And, since it contains public APIs, I would also place this file into the root directory instead of detail
.
// This file incorporates work covered by the following copyright and permission | ||
// notice: | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this part of the copyright header should not be used within DR, since it does not inherit nor use any LLVM work. It should only apply to the standard parallel algorithms in oneDPL.
|
||
auto local_segment = __detail::local(segment); | ||
|
||
auto first = stdrng::begin(local_segment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is local_segment
a range? if yes, why do we need to get an iterator and use the iterator in parallel_for instead of the range (local_segment)?
Probably, the code below is better?
auto event = dr::__detail::parallel_for(q, sycl::range<>(stdrng::distance(local_segment)),
[=](auto idx) { fn(local_segment[idx]); });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably work, but I'm not 100% sure it would work all the time. We'd need to make sure that local_segment
is a non-owning, trivially copyable view in all cases.
One other consideration is that in some cases the view might be bigger than the iterator, which could add some small overhead.
* Relative paths in includes * format fix * fixes in __detail namespace * removed unnamed namespaces * fixes in namespaces * ranges::local_or_identity in place of __detail::local * minor fix * ranges.hpp update * ranges.hpp update * local_or_identity made __detail * removed is_localizable --------- Co-authored-by: Łukasz Ślusarczyk <[email protected]> Co-authored-by: Łukasz Ślusarczyk <[email protected]>
Co-authored-by: Łukasz Ślusarczyk <[email protected]>
|
||
template <std::contiguous_iterator Iter> | ||
requires(!std::is_const_v<std::iter_value_t<Iter>> && std::is_trivially_copyable_v<std::iter_value_t<Iter>>) sycl::event | ||
fill_async(Iter first, Iter last, const std::iter_value_t<Iter>& value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, I would ask why the implementation of "fill" algo is not consistent with others, for_each
for example?
- Why there is no
oneapi::dpl::experimental::dr::sp::for_each_async
, but there isfill_async
? - Why sync version
fill
doesn't have ExecutionPolicy parameter, butfor_each
has one?
What exactly versions (sync/async) and signatures do we want to release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We originally implemented fill using SYCL fill
. The asynchronous version of fill is a single-GPU algorithm that originally called SYCL's q.fill()
. However, SYCL's fill fails on buffers larger than 2^31, so we implemented a work-around.
In the future we'd perhaps like to go back to SYCL's q.fill()
, as in theory you might be able to implement a fill
faster than with a for_each
. However, right now, it's pretty similar to for_each
, and we could probably just implement it directly with for_each
.
Lack of execution policy is just an oversight.
auto | ||
fill(DR&& r, const T& value) | ||
{ | ||
fill_async(r, value).wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest having very simple implementation based on already implemented oneapi::dpl::experimental::dr::spfor_each
and remove the all other implementation stuff from the file. We avoid huge code duplication.
oneapi::dpl::experimental::dr::for_each(std::forward<DR>(r), [value](auto& x) { x = value;});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above. I think this is a good idea (for now), although the current code structure would make it very easy to return to using SYCL's fill
, which in theory could be more efficient.
auto | ||
fill(Iter first, Iter last, const T& value) | ||
{ | ||
fill_async(stdrng::subrange(first, last), value).wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oneapi::dpl::experimental::dr::for_each(first, last, [value](auto& x) { x = value;});
? (see my comment above)
} | ||
|
||
// https://github.com/oneapi-src/distributed-ranges/issues/787 | ||
//TYPED_TEST(IotaView, Copy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a github issue already created , probably it makes sense to remove that commented code below?
{ | ||
auto iota_view = stdrng::views::iota(value, T(value + stdrng::distance(r))); | ||
|
||
for_each(par_unseq, views::zip(iota_view, r), [](auto&& elem) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that for_each call with par_unseq
leads to a static_assert within oneapi::dpl::experimental::dr::sp::for_each
implementation....
* An experiment to use std ranges unconditionally * Change to a macro that defines the shim header name * Change #ifndef to #ifdef * Remove extra underscore in the namespace macro name * Rework to use a single macro, according to the review suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was not able to look a lot from the first attempt but left some comments. My suggestion is to reach me in Teams in case you have questions (just to speed things up)
#include "oneapi/dpl/internal/common_config.h" | ||
#include "oneapi/dpl/pstl/onedpl_config.h" | ||
|
||
#if __cplusplus > 202002L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro doesn't work on Microsoft. Let's use _ONEDPL___cplusplus
or whatever it's called
|
||
#include "detail/ranges.hpp" | ||
|
||
namespace oneapi::dpl::experimental::dr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest question is why those concepts are public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to make these concepts public. At the very least, it makes the documentation clearer, but also it allows someone to implement their own variations of distributed containers, etc. And it certainly facilitates possible discussions.
CMakeLists.txt
Outdated
# if C++23 or newer, include Distributed Ranges (experimental) | ||
if (CMAKE_CXX_STANDARD GREATER_EQUAL 23) | ||
set(ONEDPL_USE_DR TRUE) | ||
message(STATUS "Adding Distributed Ranges to the project") | ||
else() | ||
message(STATUS "C++23 required to use Distributed Ranges in oneDPL") | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest problem that I have with requiring c++23 for distributed ranges is that this API is an experimental one. That means ideally we want to get feedback from the users. From the past we know that our experimental API is not very popular. Requiring c++23 has (almost) killed our chances to get any feedback.
With regard to zip_view
complexity. I think @kboyarinov has the implementation. You can ask him (when he is back from vacation) if it's still the case. I believe he will be more than happy to donate one.
template <typename... Args> | ||
requires(sizeof...(Args) >= 1 && | ||
!((sizeof...(Args) == 1 && (std::is_same_v<nonconst_iterator, std::decay_t<Args>> || ...)) || | ||
(std::is_same_v<const_iterator, std::decay_t<Args>> || ...) || | ||
(std::is_same_v<nonconst_accessor_type, std::decay_t<Args>> || ...) || | ||
(std::is_same_v<const_accessor_type, std::decay_t<Args>> || ...)) && | ||
std::is_constructible_v<accessor_type, Args...>) iterator_adaptor(Args&&... args) | ||
: accessor_(std::forward<Args>(args)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if my comment is going to be wrong but that's because this part is absolutely unreadable.
First please confirm the design intent here. As far as I understand sizeof...(Args)
should apply to all ||
conditions. So that !(sizeof...(Args) == 1 && (is_same<const_iterator> || is_same<non_const_iterator, etc>
(of course this is pseudo code). What I wrote also means that we need to check only the first argument in parameter pack, not all of them.
If that's the design intent and my ability to parse parentheses is not that bad:) this is not what is written for this constructor.
I will give my suggestion that should work for my understanding of the design intent and we can discuss it further on, if necessary. Probably I don't understand the design intent but then I need an explanation.
template <typename... Args> | |
requires(sizeof...(Args) >= 1 && | |
!((sizeof...(Args) == 1 && (std::is_same_v<nonconst_iterator, std::decay_t<Args>> || ...)) || | |
(std::is_same_v<const_iterator, std::decay_t<Args>> || ...) || | |
(std::is_same_v<nonconst_accessor_type, std::decay_t<Args>> || ...) || | |
(std::is_same_v<const_accessor_type, std::decay_t<Args>> || ...)) && | |
std::is_constructible_v<accessor_type, Args...>) iterator_adaptor(Args&&... args) | |
: accessor_(std::forward<Args>(args)...) | |
template <typename Arg, typename... Args> | |
requires(sizeof...(Args) > 0 | |
&& !std::is_same_v<nonconst_iterator, std::decay_t<Arg> | |
&& !std::is_same_v<const_iterator, std::decay_t<Arg>> | |
&& !std::is_same_v<nonconst_accessor_type, std::decay_t<Arg>> | |
&& !std::is_same_v<const_accessor_type, std::decay_t<Arg>> | |
&& std::is_constructible_v<accessor_type, Arg, Args...>) | |
iterator_adaptor(Arg&& arg, Args&&... args) | |
: accessor_(std::forward<Arg>(arg), std::forward<Args>(args)...) | |
{ | |
} |
The places of &&
this is just an example. While we have to use extra parameter (both template and function) look how simpler the requires
condition is in the suggestion.
bool | ||
operator!=(const_iterator other) const | ||
{ | ||
return !(*this == other); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our minimal required standard is c++20 we don't need operator!=
auto | ||
segments() const noexcept requires(ranges::__detail::has_segments_method<accessor_type>) | ||
{ | ||
return accessor_.segments(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does accessor_.segments()
never throw? If not it's better to have noexcept(noexcept(accessor_.segments())
template <typename I> | ||
concept remote_iterator = std::forward_iterator<I> && requires(I& iter) | ||
{ | ||
ranges::rank(iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we potentially end up with the name conflict for ranges
namespace name? My guess is we should not based on my knowledge of lookup rules but do we want to change this line and the lines below to dr::ranges
just to be safe?
To be honest I don't have a strong preference
} | ||
|
||
auto | ||
segments() const noexcept requires(ranges::__detail::has_segments_method<accessor_type>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of having requires
-clause here. C++ doesn't case an error when users don't touch a function. The argument that concepts provide better diagnostics is very agruable.
{ | ||
|
||
template <typename> | ||
inline constexpr bool disable_rank = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a public customization point? In other words, do we allow the users to specialize this variable?
// OR, if not available, | ||
// 2) r.begin().rank(), if iterator is `remote_iterator` | ||
template <stdrng::forward_range R> | ||
requires((has_rank_method<R> && !disable_rank<std::remove_cv_t<R>>) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this requires clause in my opinion because you have all if constexpr
checks for within the function It just pollutes the code. If the intent is to provide better diagnostics what I would do is writing a __dependent_false
.
So it's
template <typename... Args>
inline constexpr __dependent_false = false;
And then, between line 84 and 85 I would put
static_assert(__dependent_false<R>)
// or another trick without __dependent_false.
static_assert(sizeof(R) == 0); // which is always false
To me it's much simpler than listing all the conditions twice. You have several such places in this PR
* Updates from Adam's comments * Update for clang-format * Slight fix * Fix for `inclusive_scan` / `exclusive_scan` of size 0 * enabled back empty test in xclusive scans --------- Co-authored-by: Łukasz Ślusarczyk <[email protected]>
This draft PR adds distributed ranges as an experimental feature, inside the
onedpl::experimental::dr
namespace.