You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2024-07-30 17:30:18,378] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-07-30 17:30:23,857] [WARNING] No training data specified
[2024-07-30 17:30:23,857] [WARNING] No train_iters (recommended) or epochs specified, use default 10k iters.
[2024-07-30 17:30:23,857] [INFO] using world size: 1 and model-parallel size: 1
[2024-07-30 17:30:23,857] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
Traceback (most recent call last):
File "D:\PycharmProjects\CogVLM-main\finetune_demo\finetune_cogagent_demo.py", line 260, in
args = get_args(args_list)
^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\sat\arguments.py", line 442, in get_args
initialize_distributed(args)
File "D:\conda3\envs\cogvlm\Lib\site-packages\sat\arguments.py", line 513, in initialize_distributed
torch.distributed.init_process_group(
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper
func_return = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1302, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
Expected behavior / 期待表现
yes
The text was updated successfully, but these errors were encountered:
System Info / 系統信息
deep speed0.14.0
triton2.1.0
install torch-2.2.1+cu121-cp311-cp311-win_amd64.whl
Who can help? / 谁可以帮助到您?
finetune_demo: @1049451037
Information / 问题信息
Reproduction / 复现过程
[2024-07-30 17:30:18,378] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-07-30 17:30:23,857] [WARNING] No training data specified
[2024-07-30 17:30:23,857] [WARNING] No train_iters (recommended) or epochs specified, use default 10k iters.
[2024-07-30 17:30:23,857] [INFO] using world size: 1 and model-parallel size: 1
[2024-07-30 17:30:23,857] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
Traceback (most recent call last):
File "D:\PycharmProjects\CogVLM-main\finetune_demo\finetune_cogagent_demo.py", line 260, in
args = get_args(args_list)
^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\sat\arguments.py", line 442, in get_args
initialize_distributed(args)
File "D:\conda3\envs\cogvlm\Lib\site-packages\sat\arguments.py", line 513, in initialize_distributed
torch.distributed.init_process_group(
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\c10d_logger.py", line 86, in wrapper
func_return = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\conda3\envs\cogvlm\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1302, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
Expected behavior / 期待表现
yes
The text was updated successfully, but these errors were encountered: