prevent prompt tensors from accumulating in GPU #84

MichaMucha · 2023-04-02T22:51:49Z

this could help with CUDA OOM errors especially on consumer grade hardware.

prompt and output tensors will be erased from VRAM

csris · 2023-04-03T15:28:34Z

inference/bot.py

+            del inputs
+            output = self._tokenizer.batch_decode(outputs)[0]
+            del outputs


Thanks, @MichaMucha. Should del inputs and del outputs be in the finally block instead?

Apologies for the delay-
I suppose in case outputs failed to generate for some reason, they wouldn't be set, therefore del outputs in the finally block would throw an exception..

Not sure if a similar case can be made for inputs, but I wanted to avoid the risk of an unhandled exceptio

Not sure if a similar case can be made for inputs, but I wanted to avoid the risk of an unhandled exceptio

Makes sense, 'outputs' isn't defined if it errors out before then. Maybe just move inputs?

Can we check if inputs and outputs are set before del inputs and del outputs in the finally block?

justusc · 2023-05-12T00:20:38Z

inference/bot.py

+        try:
+            inputs = (
+                self._tokenizer(prompt, return_tensors='pt')
+                .to(self._model.device)
+            )
+            outputs = self._model.generate(
+                **inputs,
+                max_new_tokens=max_new_tokens,
+                do_sample=do_sample,
+                temperature=temperature,
+                top_k=top_k,
+                pad_token_id=self._tokenizer.eos_token_id,
+                stopping_criteria=StoppingCriteriaList([stop_criteria]),
+            )
+            del inputs
+            output = self._tokenizer.batch_decode(outputs)[0]
+            del outputs
+
+            # remove the context from the output
+            output = output[len(prompt):]
+        finally:
+            torch.cuda.empty_cache()


Suggested change

try:

inputs = (

self._tokenizer(prompt, return_tensors='pt')

.to(self._model.device)

)

outputs = self._model.generate(

**inputs,

max_new_tokens=max_new_tokens,

do_sample=do_sample,

temperature=temperature,

top_k=top_k,

pad_token_id=self._tokenizer.eos_token_id,

stopping_criteria=StoppingCriteriaList([stop_criteria]),

)

del inputs

output = self._tokenizer.batch_decode(outputs)[0]

del outputs

# remove the context from the output

output = output[len(prompt):]

finally:

torch.cuda.empty_cache()

inputs = None

outputs = None

try:

inputs = (

self._tokenizer(prompt, return_tensors='pt')

.to(self._model.device)

)

outputs = self._model.generate(

**inputs,

max_new_tokens=max_new_tokens,

do_sample=do_sample,

temperature=temperature,

top_k=top_k,

pad_token_id=self._tokenizer.eos_token_id,

stopping_criteria=StoppingCriteriaList([stop_criteria]),

)

output = self._tokenizer.batch_decode(outputs)[0]

# remove the context from the output

output = output[len(prompt):]

finally:

if inputs is not None:

del inputs

if outputs is not None:

del outputs

torch.cuda.empty_cache()

@MichaMucha ☝️

prevent prompt tensors from accumulating in GPU

74f61b1

csris reviewed Apr 3, 2023

View reviewed changes

orangetin mentioned this pull request Apr 21, 2023

NotImplementedError: Cannot copy out of meta tensor; no data! #87

Closed

justusc reviewed May 12, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prevent prompt tensors from accumulating in GPU #84

prevent prompt tensors from accumulating in GPU #84

MichaMucha commented Apr 2, 2023

csris Apr 3, 2023

MichaMucha Apr 22, 2023

orangetin Apr 22, 2023

justusc Apr 27, 2023

justusc May 12, 2023

justusc May 12, 2023

prevent prompt tensors from accumulating in GPU #84

Are you sure you want to change the base?

prevent prompt tensors from accumulating in GPU #84

Conversation

MichaMucha commented Apr 2, 2023

csris Apr 3, 2023

Choose a reason for hiding this comment

MichaMucha Apr 22, 2023

Choose a reason for hiding this comment

orangetin Apr 22, 2023

Choose a reason for hiding this comment

justusc Apr 27, 2023

Choose a reason for hiding this comment

justusc May 12, 2023

Choose a reason for hiding this comment

justusc May 12, 2023

Choose a reason for hiding this comment