Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Out of Memory #4

Open
Ezharjan opened this issue Dec 29, 2023 · 0 comments
Open

CUDA Out of Memory #4

Ezharjan opened this issue Dec 29, 2023 · 0 comments

Comments

@Ezharjan
Copy link

Ezharjan commented Dec 29, 2023

Way to Reproduce

  1. After I ran the following command and waited for a while:
python test/test_flores101.py    --lang_pair deu-eng    --retriever random    --ice_num 8    --prompt_template "</E></X>=</Y>"    --model_name facebook/xglm-7.5B    --tokenizer_name facebook/xglm-7.5B    --output_dir output    --output_file test    --seed 43
  1. The output shows CUDA out of Memory, the full output is as follows:
Traceback (most recent call last):
  File "/home/alex/Documents/MMT-LLM/test/test_flores101.py", line 122, in <module>
    print(f"BLEU score = {test_flores(args)}")
  File "/home/alex/Documents/MMT-LLM/test/test_flores101.py", line 84, in test_flores
    infr = IclGenInferencer(
  File "/home/alex/Documents/MMT-LLM/openicl/icl_inferencer/icl_gen_inferencer.py", line 63, in __init__
    super().__init__(retriever[0], metric, references, model_name, tokenizer_name, max_model_token_num, model_config, batch_size, accelerator, output_json_filepath, api_name)
  File "/home/alex/Documents/MMT-LLM/openicl/icl_inferencer/icl_base_inferencer.py", line 59, in __init__
    self.model.to(self.device)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2460, in to
    return super().to(*args, **kwargs)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 0 has a total capacty of 23.68 GiB of which 307.62 MiB is free. Including non-PyTorch memory, this process has 23.20 GiB memory in use. Of the allocated memory 22.94 GiB is allocated by PyTorch, and 4.67 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Environment

  1. Ubuntu 20.04 LTS
  2. NVIDIA GPU 3090, 24GB Memory
  3. Python 3.10.13
  4. pip environment is in accordance with the requirement.txt
  5. CUDA Toolkit and its version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

Questions

  1. What kind of hardware capability do you recommend? (What development did you use during the experiment?)
  2. Do you have any suggestions for solving the problem of CUDA_OUT_OF_MEMORY on a GPU3090 machine?
  3. Do you have some specific examples regarding this project, at least for reproducing the results mentioned in your paper.

Thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant