You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to checkpoint my module that takes the result of a checkpointed module (cond in the example below) as input.
classTest(nn.Module):
def__init__(self):
super(Test, self).__init__()
defforward(x, cond=None):
ifcondisnotNone:
# do somethingreturnresult
The above module works fine when checkpointed and cond is None. However, when cond is not None, I am getting the following error.
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed).
Saved intermediate values of the graph are freed when you call .backward() or autograd.grad().
Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
I have tried the following context managers but none of them works. This is a large module and I prefer to wrap it while ignoring the statements that cannot be checkpointed. Please kindly advise any workarounds. Thank you.
fromfairscale.nn.checkpoint.checkpoint_activationsimportdisable_checkpointingwithdisable_checkpointing():
ifcondisnotNone:
# do something
fromfairscale.nn.checkpoint.checkpoint_activationsimportenable_recomputingwithenable_recomputing():
ifcondisnotNone:
# do something
The text was updated successfully, but these errors were encountered:
Do you have a complete test script that demonstrate this issue?
It is possible that some tensor needs to be detach()'ed in this case. It is not something we have explicit tested.
Have you also tried pytorch's native checkpointing module or raise an issue with pytorch folks?
I would like to checkpoint my module that takes the result of a checkpointed module (
cond
in the example below) as input.The above module works fine when checkpointed and
cond
is None. However, whencond
is not None, I am getting the following error.I have tried the following context managers but none of them works. This is a large module and I prefer to wrap it while ignoring the statements that cannot be checkpointed. Please kindly advise any workarounds. Thank you.
The text was updated successfully, but these errors were encountered: