-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU support #52
Comments
HI @kormoczi you can call Make sure you enable the gpu mode in the very first call to NLU, or otherwise it will not enable. |
After some struggles with CUDA/Python/Ubuntu versions, finally I think the basic system is fine, I could run some basic tests on GPU. 2021-06-07 12:46:56.905136: E external/org_tensorflow/tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid Do you have any idea or suggestion? |
Thank you for sharing, taking a closer look |
HI @kormoczi |
Hi @C-K-Loan
The error comes up during nlu.load already. I am not sure, but this error message ("Status: device kernel image is invalid") looks similar to another issue, which I had recently with another project. That is a PyTorch based project, and I had to match the CUDA and the torch version, and install the appropriate torch version with the CUDA extension/support. Thanks for your help! |
Hi @kormoczi, Let me know if you have more trouble after installing Cuda 11.2 |
Hi @C-K-Loan, |
Since NLU is based on Spark NLP, GPU requirements:
Since the latest NLU is on Spark NLP 3.x you should go with the first options. Make sure you follow TensorFlow instructions for installing/settings GPU correctly especially PS: As for why Google Colab with CUDA 11.x can work with Spark NLP 3.x, they simply have all the CUDA 10.1, 10.2, and 11.x dynamic files available in the path so Spark NLP finds them regardless of the default CUDA version. You should be able to see something like this:
Also, this is a nice thread to read when GPU setup becomes tricky: https://spark-nlp.slack.com/archives/CA118BWRM/p1620933399356800 |
Sorry, I am a little bit confused right now. |
@maziyarpanahi |
@kormoczi my bad, what @maziyarpanahi suggested is correct. This looks most likely like a Tensorflow installation issue. Maybe try verifying Tensorflow has access to GPU https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell The thread that has been posted by @maziyarpanahi is visible when you join the Slack channel https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA , it is our community with over 2000 people helping each other Hope this helps |
@C-K-Loan I have double checked the Tensorflow install (as described on the link you have provided), with using the following python script:
And it looks ok, this is the output:
But the nlu does not work, this script:
gives the error mentioned in the beginning of this thread (the error happens during nlu.load). But I think these are not the same Tensorflow installs, the first one is installed with pip, the second one is installed by the nlu, as a jar package...
By the way, now I have installed nlu based on the colad_setup.sh script. The value of the LD_LIBRARY_PATH was "/usr/local/nvidia/lib:/usr/local/nvidia/lib64", I have replaced it with the following: "/usr/local/cuda/lib64", but no change either (there is no directory named /usr/local/nvidia). |
Hi @kormoczi Could you test a couple of other Tensorflow based models and see if this error occurs? i.e. try please
please let me know if you get the same errors or if this only happens on translate. This could be related to Alternatively, please try the following pipe_translate = nlu.load('hu.translate_to.en')
pipe_translate.components
pipe_translate.components.remove(pipe_translate.components[1])
pipe_translate.predict('Hello world', output_level ='document') This will remove the SentenceDetectorDL, which is causing the error in the pipeline for you. |
Hi,
I am using the Marian Models for translation.
It works fine, but I am assuming it works only on CPU
(I am using the following code:
pipe_translate = nlu.load('hu.translate_to.en')
translate = pipe_translate.predict("Sziasztok, mi a helyzet?")
and the predict part takes about 5 second, and I have an A100 GPU,
I dont think this should take so long...)
I can't figure it out, how to use the GPU, or how to check, if it uses the GPU...
(print (tf.test.gpu_device_name()) show the the GPU is there...)
Where can I find some documentation/info about this issue?
I had some issues with CUDA and java installation, but right now these look fine...
Thanks
The text was updated successfully, but these errors were encountered: