Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-blocking gather_csr_cuda() #305

Open
rBenke opened this issue Jun 23, 2022 · 3 comments
Open

Non-blocking gather_csr_cuda() #305

rBenke opened this issue Jun 23, 2022 · 3 comments

Comments

@rBenke
Copy link

rBenke commented Jun 23, 2022

Is it possible to make gather_csr_cuda() without cpu-gpu sync?

I can only guess that the problem is in line 248 in csrc/cuda/segment_csr_cuda.cu:

sizes[dim] = indptr.flatten()[-1].cpu().data_ptr<int64_t>()[0];
@rusty1s
Copy link
Owner

rusty1s commented Jun 24, 2022

We query the last element of indptr to determine the output size of the out tensor. I don't think there is any real workaround besides passing in an out tensor as part of the input arguments (already supported). Would that work for you?

@rBenke
Copy link
Author

rBenke commented Jun 24, 2022

I'd have to look more closely at PyG, but in general, what I'm trying to achieve is a fully non-blocking GAT forward pass (with csr data layout). I'd appreciate any hint if you know how to do this, or I'll take some time to figure it out next week.

@github-actions
Copy link

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants