You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once we use 2^16 or more channels, we get the following error:
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
This might be because the implementation parallelizes the blocks across the batch and channel axes, but CUDA blocks only go up to 2^16-1 along the y axis:
Once we use 2^16 or more channels, we get the following error:
Minimal reproducible code:
This might be because the implementation parallelizes the blocks across the batch and channel axes, but CUDA blocks only go up to 2^16-1 along the y axis:
causal-conv1d/csrc/causal_conv1d_fwd.cu
Line 65 in f8c2467
Flipping the
.x
and.y
should fix the issue (at least for the non-channellast implementation).The text was updated successfully, but these errors were encountered: