-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can exclude some layer parameter not to shard? #1123
Comments
Thanks for the question and tagging me. No, the params will still be sharded by the outer FSDP wrapper, just excluded by the auto_wrap algorithm to determine the nested wrapping structure. The actual sharding config is determined by the wrapper's process_group argument. If the process_group contains only a single GPU, then it is not sharded. To check the wrapping structure, you can simply print() out the model and examine where are the FSDP wrappers inserted. |
@min-xu-ai Thank you for your kind reply. Is that mean the FSDP wrap module will flatten all model parameters and shard on each rank? If so, what's the behavior of the inner module with the FSDP wrapper will be? For the 'To check the wrapping structure, you can simply print() out the model and examine where are the FSDP wrappers inserted.', is there any API to check the specific shard module parameter, like the FSDP wrapper layer has |
@min-xu-ai There is another question about the FSDP wrapper in a single GPU, on the
|
Both inner and outer wrappers do the same sharding based on the process group it is given. They just own different setup of params based on which param is wrapped by which wrapper. Flatten or not it a separate argument to the wrappers.
I don't think there is any API for this, but you can inspect them directly as long as you can access the wrapper object.
No idea. This might be a corner case bug. maybe you can try pytorch's version of FSDP. See if that one work better for you or not. |
Thank you for your kind reply, I got it. |
This
default_auto_wrap_policy
function has parameterexclude_wrap_modules
for excluding module types in wrapping. Is that mean that the module's parameter will not shard? how can I check if it works or not? @min-xu-aiThe text was updated successfully, but these errors were encountered: