Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named Symbol not found (torchchat #1298) #1110

Open
mikekgfb opened this issue Oct 17, 2024 · 3 comments · May be fixed by #1112
Open

Named Symbol not found (torchchat #1298) #1110

mikekgfb opened this issue Oct 17, 2024 · 3 comments · May be fixed by #1112
Labels
good first issue Good for newcomers

Comments

@mikekgfb
Copy link

Quantized model gets a CUDA error "Named symbol not found".
see pytorch/torchchat#1298

@mikekgfb
Copy link
Author

Cuda version:
NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2

This ran on Google colab. Detailed trace-back/repro: https://colab.research.google.com/drive/1PRneJBaS5TlJaIgc4Lwv2muiePp6T9Ss?usp=sharing

$nvidia-smi
Thu Oct 17 19:52:30 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

$ python torchchat.py generate stories110M --quant torchchat/quant_config/cuda.json --prompt "It was a dark and stormy night, and"
2024-10-17 19:52:38.809413: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-17 19:52:39.031641: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-17 19:52:39.096690: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-17 19:52:39.466437: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-17 19:52:41.795530: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Downloading builder script: 100% 5.67k/5.67k [00:00<00:00, 18.9MB/s]
Downloading https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt...
Downloading https://github.com/karpathy/llama2.c/raw/master/tokenizer.model...
Moving model to /root/.torchchat/model-cache/stories110M.
Using device=cuda Tesla T4
Loading model...
Time to load model: 0.40 seconds
Quantizing the model with: {'executor': {'accelerator': 'cuda'}, 'precision': {'dtype': 'bf16'}, 'linear:int4': {'groupsize': 256}}
Time to quantize model: 0.13 seconds
Traceback (most recent call last):
  File "/content/torchchat-1/torchchat.py", line 88, in <module>
    generate_main(args)
  File "/content/torchchat-1/torchchat/generate.py", line 1215, in main
    gen = Generator(
  File "/content/torchchat-1/torchchat/generate.py", line 290, in __init__
    self.model = _initialize_model(self.builder_args, self.quantize, self.tokenizer)
  File "/content/torchchat-1/torchchat/cli/builder.py", line 574, in _initialize_model
    quantize_model(
  File "/content/torchchat-1/torchchat/utils/quantize.py", line 114, in quantize_model
    quantize_(model, int4_weight_only(q_kwargs["groupsize"]))
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/quant_api.py", line 462, in quantize_
    _replace_with_custom_fn_if_matches_filter(
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/quant_api.py", line 202, in _replace_with_custom_fn_if_matches_filter
    new_child = _replace_with_custom_fn_if_matches_filter(
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/quant_api.py", line 202, in _replace_with_custom_fn_if_matches_filter
    new_child = _replace_with_custom_fn_if_matches_filter(
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/quant_api.py", line 202, in _replace_with_custom_fn_if_matches_filter
    new_child = _replace_with_custom_fn_if_matches_filter(
  [Previous line repeated 2 more times]
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/quant_api.py", line 198, in _replace_with_custom_fn_if_matches_filter
    model = replacement_fn(model)
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/quant_api.py", line 392, in insert_subclass
    lin.weight = torch.nn.Parameter(constructor(lin.weight, **kwargs), requires_grad=requires_grad)
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/quant_api.py", line 553, in apply_int4_weight_only_quant
    return to_affine_quantized_intx(weight, mapping_type, block_size, target_dtype, quant_min, quant_max, eps, zero_point_dtype=zero_point_dtype, preserve_zero=preserve_zero, zero_point_domain=zero_point_domain, layout_type=layout_type, use_hqq=use_hqq)
  File "/usr/local/lib/python3.10/dist-packages/torchao/dtypes/affine_quantized_tensor.py", line 286, in from_hp_to_intx
    layout_tensor = layout_tensor_ctr(data, scale, zero_point, layout_type)
  File "/usr/local/lib/python3.10/dist-packages/torchao/dtypes/affine_quantized_tensor.py", line 1033, in from_plain
    scale_and_zero = pack_tinygemm_scales_and_zeros(scale, zero_point)
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/utils.py", line 322, in pack_tinygemm_scales_and_zeros
    torch.cat(
RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@gau-nernst
Copy link
Collaborator

I believe this is because tinygemm does not support sm75 i.e. T4

@msaroufim
Copy link
Member

We could throw a better error message

@msaroufim msaroufim added the good first issue Good for newcomers label Oct 18, 2024
@msaroufim msaroufim linked a pull request Oct 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants