Replies: 1 comment
-
看样子像是nccl或者mem出错了。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using Docker and experimenting with examples of AISHELL. I have ran the script run.sh in "DeepSpeech/examples/aishell/s0".
I am getting following errors shown, I am not able to debug the issue of error. Errors are listed below. It will be very good if proper steps and manuals are provided:
Errors:
E0819 07:01:47.576184 7372 pybind.cc:1584] Invalid CUDAPlace(1), must inside [0, 1), because GPU number on your machine is 1
W0819 07:01:47.595293 7371 gen_comm_id_helper.cc:120] connect addr=127.0.0.1:45449 failed 1 times with reason: Connection refused retry after 0.5 seconds
E0819 07:01:47.598155 7373 pybind.cc:1584] Invalid CUDAPlace(2), must inside [0, 1), because GPU number on your machine is 1
E0819 07:01:47.598351 7374 pybind.cc:1584] Invalid CUDAPlace(3), must inside [0, 1), because GPU number on your machine is 1
C++ Traceback (most recent call last):
0 paddle::imperative::NCCLParallelContext::Init()
1 paddle::imperative::NCCLParallelContext::BcastNCCLId(std::vector<ncclUniqueId, std::allocator >&, int, int)
2 void paddle::platform::SendBroadCastCommID(std::vector<std::string, std::allocator<std::string > >, std::vector<ncclUniqueId, std::allocator >)
3 paddle::framework::SignalHandle(char const, int)
4 paddle::platform::GetCurrentTraceBackStringabi:cxx11
Error Message Summary:
FatalError:
Termination signal
is detected by the operating system.[TimeInfo: *** Aborted at 1629356507 (unix time) try "date -d @1629356507" if you are using GNU date ***]
[SignalInfo: *** SIGTERM (@0x1CA5) received by PID 7371 (TID 0x7f80c4287740) from PID 7333 ***]
I am not able to trace what is source of problem
Beta Was this translation helpful? Give feedback.
All reactions