Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running the code of the module finetune_demo on the windosw11 system, an error will be reported #515

Open
1 of 2 tasks
gyjlll opened this issue Jul 30, 2024 · 0 comments

Comments

@gyjlll
Copy link

gyjlll commented Jul 30, 2024

System Info / 系統信息

deep speed0.14.0
triton2.1.0
install torch-2.2.1+cu121-cp311-cp311-win_amd64.whl

Who can help? / 谁可以帮助到您?

finetune_demo: @1049451037

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

[2024-07-30 17:30:18,378] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-07-30 17:30:23,857] [WARNING] No training data specified
[2024-07-30 17:30:23,857] [WARNING] No train_iters (recommended) or epochs specified, use default 10k iters.
[2024-07-30 17:30:23,857] [INFO] using world size: 1 and model-parallel size: 1
[2024-07-30 17:30:23,857] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
Traceback (most recent call last):
File "D:\PycharmProjects\CogVLM-main\finetune_demo\finetune_cogagent_demo.py", line 260, in
args = get_args(args_list)
^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\sat\arguments.py", line 442, in get_args
initialize_distributed(args)
File "D:\conda3\envs\cogvlm\Lib\site-packages\sat\arguments.py", line 513, in initialize_distributed
torch.distributed.init_process_group(
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper
func_return = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1302, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in

Expected behavior / 期待表现

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant