Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributed ranges as experimental feature. #1479

Open
wants to merge 99 commits into
base: main
Choose a base branch
from

Conversation

BenBrock
Copy link
Contributor

@BenBrock BenBrock commented Apr 4, 2024

This draft PR adds distributed ranges as an experimental feature, inside the onedpl::experimental::dr namespace.

Copy link
Contributor

@MikeDvorskiy MikeDvorskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left the comments and questions regarding to for_each implementation (sp/algorithms/for_each.hpp)

namespace oneapi::dpl::experimental::dr::sp
{

template <typename ExecutionPolicy, distributed_range R, typename Fn>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What kinds of execution policy are supported? (by design and by this implementation)
    As far the C++20 concepts (distributed_range at least) are already used, it makes sense to add constrains on an execution policy as well.
    In oneDPL, there is oneapi::dpl::is_execution_policy_v for example.

  2. Does it make sense to add a constrain on Fn as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The first line in the method is static_assert that the only supported execution policy is
static_assert( // currently only one policy supported
        std::is_same_v<std::remove_cvref_t<ExecutionPolicy>, sycl_device_collection>);

IMO this is enough and simple

  1. I don't think so. There are no constrains for functor in std::for_each

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Having "concepts" for some parameter(s) in a signature and having "static_assert" for the other parameter(s) - not consistent approach. Or we use type requirements (concepts and contains) in a algo signature or don't use at all. As far as we use C++20, I would vote for "concepts and constrains" usage. A user of oneDPL usually sees into the documentation where a signature there is.
  2. But there is in std::ranges::for_each

Copy link
Contributor Author

@BenBrock BenBrock Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Originally posted my comment about par_unseq here, but have moved it below.)

We basically only have one type of device policy, which is a sycl_device_collection. The standard library doesn't have any constraints for ExecutionPolicy, which is why we have things as they are now.

I'd be fine with adding a is_execution_policy_v requirement or execution_policy concept.


for (auto&& segment : ranges::segments(r))
{
auto&& q = __detail::queue(ranges::rank(segment));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand just sycl device policy is supported for current implementtaion..
So, in the future, what will be queue in case of a host policy, std::execution::par, for example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a separate code without queues and the code is selected by constexpr. We will see. For now there is no need to design the solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, probably it makes sense to add a constrain on an execution policy, like
is_sycl_device_collection_v?

auto first = stdrng::begin(local_segment);

auto event = dr::__detail::parallel_for(q, sycl::range<>(stdrng::distance(local_segment)),
[=](auto idx) { fn(*(first + idx)); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't insist... but first[idx] instead of *(first + idx) more readable and shorter, IMHO.

Copy link
Contributor

@lslusarczyk lslusarczyk Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank's for spotting this, applied in #1758 here and in a few other places

__detail::wait(events);
}

template <typename ExecutionPolicy, distributed_iterator Iter, typename Fn>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment (see above) regarding ExecutionPolicy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer, for now static_assert is enough I think


template <distributed_range R, typename Fn>
void
for_each(R&& r, Fn fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That re-call of for_each w/o policy to for_each(par_unseq) is done by design? Is the some explanation and statement in the documentation?
  2. According to the static_assert (line 34) above par_unseq is not currently supported...
  3. Taking into account 1) and 2) does whether it make sense to release non-policy for_each versions at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These algorithms are restricted by distributed range/iterator concepts. It's a question if we want to keep it this way in the long run, but let's keep the approach as-is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This par_unseq is from our namespace which is

inline sycl_device_collection par_unseq;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline sycl_device_collection par_unseq;

It totally confuses any C++ developers which are aware of C+17 execution policies.
(and it is a potential name conflict...)
I would suggest renaming one in something proper name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what's problematic about using the name par_unseq as a catch-all execution policy to execute across multiple devices?

From my understanding of the spec, par_unseq requires:

  • Execution in "unordered fashion" in "unspecified threads of execution," and
  • These threads must provide "weakly parallel forward progress guarantees."

From what I understand, threads in SYCL have weakly parallel forward progress guarantees. Is it the "threads of execution" part that would be violated?

Other libraries like Nvidia's stdpar use par_unseq to mean "execute in parallel, including on GPUs if that's where the data is." That's what we're doing here, and it seems like a reasonable choice to me. Is there a convention in oneDPL that par_unseq is CPU only, or is there another good reason to not use par_unseq as the "sane default" that includes potentially running on a GPU if that's where the data is resident?

std::size_t idx = 0;
for (auto&& segs : zipped_segments)
{
auto&& [in_segment, out_segment] = segs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this line can be moved within the if scope. Or perhaps this loop can be rewritten to have a if (idx == 0) continue; at the top and remove the following if entirely.

h.depends_on(event);
h.single_task([=]() {
stdrng::range_value_t<O> value = *src_iter;
*dst_iter = value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the explicit temporary here by rewriting as follows:

Suggested change
*dst_iter = value;
*dst_iter = *src_iter;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes should be fine (#1766), but need to be tested on PVC to double-check before merging.

@lslusarczyk could you check if things look okay on your end?

init(R&& devices) requires(std::is_same_v<sycl::device, std::remove_cvref_t<stdrng::range_value_t<R>>>)
{
__detail::devices_.assign(stdrng::begin(devices), stdrng::end(devices));
__detail::global_context_ = new sycl::context(__detail::devices_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, there is nothing preventing a user from calling this init function multiple times, which would result in a memory leak here. Would it help if the context was stored in a shared_ptr or unique_ptr? Or is there something else that I am missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests are a good start but could be expanded to increase the test coverage. E.g.:

  1. Testing with 10 elements only seems limited for what DR is designed for. The other oneDPL tests use loops with increasing counts.
  2. It looks like we test with int only. Different types and potentially a custom type might identify some gaps.
  3. It looks like we test with std::plus and the custom max operator only.


TYPED_TEST_SUITE(Reduce, AllTypes);

TYPED_TEST(Reduce, Range) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called RangeMax instead of the test below?

TYPED_TEST(Reduce, Range) {
Ops1<TypeParam> ops(10);

auto max = [](double x, double y) { return std::max(x, y); };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use TypeParam instead of double?

Ops1<TypeParam> ops(10);

auto max = [](double x, double y) { return std::max(x, y); };
EXPECT_EQ(std::reduce(ops.vec.begin(), ops.vec.end(), 3, max),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init is set to the integer 3 independent of TypeParam. This could change the result depending on TypeParam. Is this by design? And do we need tests with init of type TypeParam?

xp::reduce(ops.dist_vec, 3, max));
}

TYPED_TEST(Reduce, Max) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be called RangePlus?

xp::reduce(ops.dist_vec, 3, std::plus{}));
}

TYPED_TEST(Reduce, Iterators) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be called IteratorsPlus? And why don't we test max with iterators?

lslusarczyk and others added 2 commits August 1, 2024 21:44
* concepts.hpp moved and included in dpl/distributed_ranges

* fixes after merge

---------

Co-authored-by: Łukasz Ślusarczyk <[email protected]>
lslusarczyk and others added 2 commits August 2, 2024 12:06
* sp::views::enumerate and __detail::enumarate merged

* update

* update

* update

* C++23 enumerate_view used instead of our implementation

* cleanup

* update

* detail/enumerate.hpp should be removed

* self applied comments

* fix compilation in g++13

---------

Co-authored-by: Łukasz Ślusarczyk <[email protected]>
Comment on lines +21 to +27
namespace oneapi::dpl::experimental::dr
{

template <typename T>
struct is_ref_view : std::false_type
{
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the "view detectors" considered a part of public API? If not, better move those to __detail.

Comment on lines +36 to +46
template <typename T>
struct is_iota_view : std::false_type
{
};
template <std::weakly_incrementable W>
struct is_iota_view<stdrng::iota_view<W>> : std::true_type
{
};

template <typename T>
inline constexpr bool is_iota_view_v = is_iota_view<T>{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_iota_view is not used anywhere, including the original DR repository.

Comment on lines +88 to +97
#if (defined __cpp_lib_ranges_slide)

template <typename T>
struct is_sliding_view<stdrng::slide_view<T>> : std::true_type
{
};
template <typename T>
inline constexpr bool is_sliding_view_v = is_sliding_view<std::remove_cvref_t<T>>::value;

#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not is_sliding_view_v be moved out of #if...#endif?

Comment on lines +99 to +112
template <typename T>
struct is_zip_view : std::false_type
{
};

#if (defined _cpp_lib_ranges_zip)
template <typename... Views>
struct is_zip_view<stdrng::zip_view<Views...>> : std::true_type
{
};

#endif
template <typename T>
inline constexpr bool is_zip_view_v = is_zip_view<T>::value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_zip_view is not used anywhere, including the original DR repo.

Copy link
Contributor

@akukanov akukanov Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While detail/ranges.hpp suggests that there are some internals of the DR implementation, in fact, this file contains the implementation of public customization points for DR: ranges::rank and ranges::segments. So I would suggest to call it appropriately - maybe cpos.hpp, or split into rank.hpp and segments.hpp. And, since it contains public APIs, I would also place this file into the root directory instead of detail.

Comment on lines +8 to +13
// This file incorporates work covered by the following copyright and permission
// notice:
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this part of the copyright header should not be used within DR, since it does not inherit nor use any LLVM work. It should only apply to the standard parallel algorithms in oneDPL.


auto local_segment = __detail::local(segment);

auto first = stdrng::begin(local_segment);
Copy link
Contributor

@MikeDvorskiy MikeDvorskiy Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is local_segment a range? if yes, why do we need to get an iterator and use the iterator in parallel_for instead of the range (local_segment)?
Probably, the code below is better?

auto event = dr::__detail::parallel_for(q, sycl::range<>(stdrng::distance(local_segment)),
                                                [=](auto idx) { fn(local_segment[idx]); });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably work, but I'm not 100% sure it would work all the time. We'd need to make sure that local_segment is a non-owning, trivially copyable view in all cases.

One other consideration is that in some cases the view might be bigger than the iterator, which could add some small overhead.

lslusarczyk and others added 3 commits August 2, 2024 20:39
* Relative paths in includes

* format fix

* fixes in __detail namespace

* removed unnamed namespaces

* fixes in namespaces

* ranges::local_or_identity in place of __detail::local

* minor fix

* ranges.hpp update

* ranges.hpp update

* local_or_identity made __detail

* removed is_localizable

---------

Co-authored-by: Łukasz Ślusarczyk <[email protected]>
Co-authored-by: Łukasz Ślusarczyk <[email protected]>

template <std::contiguous_iterator Iter>
requires(!std::is_const_v<std::iter_value_t<Iter>> && std::is_trivially_copyable_v<std::iter_value_t<Iter>>) sycl::event
fill_async(Iter first, Iter last, const std::iter_value_t<Iter>& value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, I would ask why the implementation of "fill" algo is not consistent with others, for_each for example?

  1. Why there is no oneapi::dpl::experimental::dr::sp::for_each_async, but there is fill_async?
  2. Why sync version fill doesn't have ExecutionPolicy parameter, but for_each has one?
    What exactly versions (sync/async) and signatures do we want to release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We originally implemented fill using SYCL fill. The asynchronous version of fill is a single-GPU algorithm that originally called SYCL's q.fill(). However, SYCL's fill fails on buffers larger than 2^31, so we implemented a work-around.

In the future we'd perhaps like to go back to SYCL's q.fill(), as in theory you might be able to implement a fill faster than with a for_each. However, right now, it's pretty similar to for_each, and we could probably just implement it directly with for_each.

Lack of execution policy is just an oversight.

auto
fill(DR&& r, const T& value)
{
fill_async(r, value).wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest having very simple implementation based on already implemented oneapi::dpl::experimental::dr::spfor_each and remove the all other implementation stuff from the file. We avoid huge code duplication.

oneapi::dpl::experimental::dr::for_each(std::forward<DR>(r), [value](auto& x) { x = value;});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. I think this is a good idea (for now), although the current code structure would make it very easy to return to using SYCL's fill, which in theory could be more efficient.

auto
fill(Iter first, Iter last, const T& value)
{
fill_async(stdrng::subrange(first, last), value).wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneapi::dpl::experimental::dr::for_each(first, last, [value](auto& x) { x = value;}); ? (see my comment above)

}

// https://github.com/oneapi-src/distributed-ranges/issues/787
//TYPED_TEST(IotaView, Copy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a github issue already created , probably it makes sense to remove that commented code below?

{
auto iota_view = stdrng::views::iota(value, T(value + stdrng::distance(r)));

for_each(par_unseq, views::zip(iota_view, r), [](auto&& elem) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that for_each call with par_unseq leads to a static_assert within oneapi::dpl::experimental::dr::sp::for_each implementation....

lslusarczyk and others added 2 commits August 5, 2024 15:15
* An experiment to use std ranges unconditionally

* Change to a macro that defines the shim header name

* Change #ifndef to #ifdef

* Remove extra underscore in the namespace macro name

* Rework to use a single macro, according to the review suggestion
Copy link
Contributor

@rarutyun rarutyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not able to look a lot from the first attempt but left some comments. My suggestion is to reach me in Teams in case you have questions (just to speed things up)

#include "oneapi/dpl/internal/common_config.h"
#include "oneapi/dpl/pstl/onedpl_config.h"

#if __cplusplus > 202002L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro doesn't work on Microsoft. Let's use _ONEDPL___cplusplus or whatever it's called


#include "detail/ranges.hpp"

namespace oneapi::dpl::experimental::dr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest question is why those concepts are public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to make these concepts public. At the very least, it makes the documentation clearer, but also it allows someone to implement their own variations of distributed containers, etc. And it certainly facilitates possible discussions.

CMakeLists.txt Outdated
Comment on lines 315 to 321
# if C++23 or newer, include Distributed Ranges (experimental)
if (CMAKE_CXX_STANDARD GREATER_EQUAL 23)
set(ONEDPL_USE_DR TRUE)
message(STATUS "Adding Distributed Ranges to the project")
else()
message(STATUS "C++23 required to use Distributed Ranges in oneDPL")
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest problem that I have with requiring c++23 for distributed ranges is that this API is an experimental one. That means ideally we want to get feedback from the users. From the past we know that our experimental API is not very popular. Requiring c++23 has (almost) killed our chances to get any feedback.

With regard to zip_view complexity. I think @kboyarinov has the implementation. You can ask him (when he is back from vacation) if it's still the case. I believe he will be more than happy to donate one.

Comment on lines +52 to +59
template <typename... Args>
requires(sizeof...(Args) >= 1 &&
!((sizeof...(Args) == 1 && (std::is_same_v<nonconst_iterator, std::decay_t<Args>> || ...)) ||
(std::is_same_v<const_iterator, std::decay_t<Args>> || ...) ||
(std::is_same_v<nonconst_accessor_type, std::decay_t<Args>> || ...) ||
(std::is_same_v<const_accessor_type, std::decay_t<Args>> || ...)) &&
std::is_constructible_v<accessor_type, Args...>) iterator_adaptor(Args&&... args)
: accessor_(std::forward<Args>(args)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my comment is going to be wrong but that's because this part is absolutely unreadable.

First please confirm the design intent here. As far as I understand sizeof...(Args) should apply to all || conditions. So that !(sizeof...(Args) == 1 && (is_same<const_iterator> || is_same<non_const_iterator, etc> (of course this is pseudo code). What I wrote also means that we need to check only the first argument in parameter pack, not all of them.

If that's the design intent and my ability to parse parentheses is not that bad:) this is not what is written for this constructor.

I will give my suggestion that should work for my understanding of the design intent and we can discuss it further on, if necessary. Probably I don't understand the design intent but then I need an explanation.

Suggested change
template <typename... Args>
requires(sizeof...(Args) >= 1 &&
!((sizeof...(Args) == 1 && (std::is_same_v<nonconst_iterator, std::decay_t<Args>> || ...)) ||
(std::is_same_v<const_iterator, std::decay_t<Args>> || ...) ||
(std::is_same_v<nonconst_accessor_type, std::decay_t<Args>> || ...) ||
(std::is_same_v<const_accessor_type, std::decay_t<Args>> || ...)) &&
std::is_constructible_v<accessor_type, Args...>) iterator_adaptor(Args&&... args)
: accessor_(std::forward<Args>(args)...)
template <typename Arg, typename... Args>
requires(sizeof...(Args) > 0
&& !std::is_same_v<nonconst_iterator, std::decay_t<Arg>
&& !std::is_same_v<const_iterator, std::decay_t<Arg>>
&& !std::is_same_v<nonconst_accessor_type, std::decay_t<Arg>>
&& !std::is_same_v<const_accessor_type, std::decay_t<Arg>>
&& std::is_constructible_v<accessor_type, Arg, Args...>)
iterator_adaptor(Arg&& arg, Args&&... args)
: accessor_(std::forward<Arg>(arg), std::forward<Args>(args)...)
{
}

The places of && this is just an example. While we have to use extra parameter (both template and function) look how simpler the requires condition is in the suggestion.

Comment on lines +80 to +84
bool
operator!=(const_iterator other) const
{
return !(*this == other);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our minimal required standard is c++20 we don't need operator!=

auto
segments() const noexcept requires(ranges::__detail::has_segments_method<accessor_type>)
{
return accessor_.segments();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does accessor_.segments() never throw? If not it's better to have noexcept(noexcept(accessor_.segments())

template <typename I>
concept remote_iterator = std::forward_iterator<I> && requires(I& iter)
{
ranges::rank(iter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we potentially end up with the name conflict for ranges namespace name? My guess is we should not based on my knowledge of lookup rules but do we want to change this line and the lines below to dr::ranges just to be safe?

To be honest I don't have a strong preference

}

auto
segments() const noexcept requires(ranges::__detail::has_segments_method<accessor_type>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of having requires-clause here. C++ doesn't case an error when users don't touch a function. The argument that concepts provide better diagnostics is very agruable.

{

template <typename>
inline constexpr bool disable_rank = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a public customization point? In other words, do we allow the users to specialize this variable?

// OR, if not available,
// 2) r.begin().rank(), if iterator is `remote_iterator`
template <stdrng::forward_range R>
requires((has_rank_method<R> && !disable_rank<std::remove_cv_t<R>>) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this requires clause in my opinion because you have all if constexpr checks for within the function It just pollutes the code. If the intent is to provide better diagnostics what I would do is writing a __dependent_false.

So it's

template <typename... Args>
inline constexpr __dependent_false = false;

And then, between line 84 and 85 I would put

static_assert(__dependent_false<R>)
// or another trick without __dependent_false.
static_assert(sizeof(R) == 0); // which is always false

To me it's much simpler than listing all the conditions twice. You have several such places in this PR

* Updates from Adam's comments

* Update for clang-format

* Slight fix

* Fix for `inclusive_scan` / `exclusive_scan` of size 0

* enabled back empty test in xclusive scans

---------

Co-authored-by: Łukasz Ślusarczyk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.