CUDA Out of Memory #4

Ezharjan · 2023-12-29T13:09:47Z

Way to Reproduce

After I ran the following command and waited for a while:

python test/test_flores101.py    --lang_pair deu-eng    --retriever random    --ice_num 8    --prompt_template "</E></X>=</Y>"    --model_name facebook/xglm-7.5B    --tokenizer_name facebook/xglm-7.5B    --output_dir output    --output_file test    --seed 43

The output shows CUDA out of Memory, the full output is as follows:

Traceback (most recent call last):
  File "/home/alex/Documents/MMT-LLM/test/test_flores101.py", line 122, in <module>
    print(f"BLEU score = {test_flores(args)}")
  File "/home/alex/Documents/MMT-LLM/test/test_flores101.py", line 84, in test_flores
    infr = IclGenInferencer(
  File "/home/alex/Documents/MMT-LLM/openicl/icl_inferencer/icl_gen_inferencer.py", line 63, in __init__
    super().__init__(retriever[0], metric, references, model_name, tokenizer_name, max_model_token_num, model_config, batch_size, accelerator, output_json_filepath, api_name)
  File "/home/alex/Documents/MMT-LLM/openicl/icl_inferencer/icl_base_inferencer.py", line 59, in __init__
    self.model.to(self.device)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2460, in to
    return super().to(*args, **kwargs)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/alex/anaconda3/envs/mmt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 0 has a total capacty of 23.68 GiB of which 307.62 MiB is free. Including non-PyTorch memory, this process has 23.20 GiB memory in use. Of the allocated memory 22.94 GiB is allocated by PyTorch, and 4.67 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Environment

Ubuntu 20.04 LTS
NVIDIA GPU 3090, 24GB Memory
Python 3.10.13
pip environment is in accordance with the requirement.txt
CUDA Toolkit and its version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

Questions

What kind of hardware capability do you recommend? (What development did you use during the experiment?)
Do you have any suggestions for solving the problem of CUDA_OUT_OF_MEMORY on a GPU3090 machine?
Do you have some specific examples regarding this project, at least for reproducing the results mentioned in your paper.

Thanks~

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Out of Memory #4

CUDA Out of Memory #4

Ezharjan commented Dec 29, 2023 •

edited

Loading

CUDA Out of Memory #4

CUDA Out of Memory #4

Comments

Ezharjan commented Dec 29, 2023 • edited Loading

Way to Reproduce

Environment

Questions

Ezharjan commented Dec 29, 2023 •

edited

Loading