Non-blocking gather_csr_cuda() #305

rBenke · 2022-06-23T19:51:20Z

Is it possible to make gather_csr_cuda() without cpu-gpu sync?

I can only guess that the problem is in line 248 in csrc/cuda/segment_csr_cuda.cu:

sizes[dim] = indptr.flatten()[-1].cpu().data_ptr<int64_t>()[0];

The text was updated successfully, but these errors were encountered:

rusty1s · 2022-06-24T03:34:11Z

We query the last element of indptr to determine the output size of the out tensor. I don't think there is any real workaround besides passing in an out tensor as part of the input arguments (already supported). Would that work for you?

rBenke · 2022-06-24T10:45:12Z

I'd have to look more closely at PyG, but in general, what I'm trying to achieve is a fully non-blocking GAT forward pass (with csr data layout). I'd appreciate any hint if you know how to do this, or I'll take some time to figure it out next week.

github-actions · 2022-12-22T01:22:21Z

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

github-actions bot added the stale label Dec 22, 2022

rusty1s added enhancement and removed stale labels Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-blocking gather_csr_cuda() #305

Non-blocking gather_csr_cuda() #305

rBenke commented Jun 23, 2022

rusty1s commented Jun 24, 2022

rBenke commented Jun 24, 2022

github-actions bot commented Dec 22, 2022

Non-blocking gather_csr_cuda() #305

Non-blocking gather_csr_cuda() #305

Comments

rBenke commented Jun 23, 2022

rusty1s commented Jun 24, 2022

rBenke commented Jun 24, 2022

github-actions bot commented Dec 22, 2022