Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-compose-triton-gpu fails due to "PTX was compiled with an unsupported toolchain" #29

Open
rg314 opened this issue Jun 3, 2022 · 2 comments

Comments

@rg314
Copy link

rg314 commented Jun 3, 2022

Describe the bug
After following the docker-compose-triton-gpu.yml instructions for the pytorch example the server fails to spin up. The service fails due to the following error:

model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.

To Reproduce
Steps to reproduce the behavior:

  1. Run the pytorch example in
    # Train and Deploy Keras model with Nvidia Triton Engine

Expected behavior
The service spins up without the model_repository_manager.cc:1152 error message.

Screenshots
n/a

Desktop (please complete the following information):

  • OS: Ubuntu 20.04.1
  • Virtualization version: (docker --version & docker-compose --version)
    docker --version & docker-compose --version [1] 1611412 Docker version 20.10.16, build aa7e414 docker-compose version 1.29.2, build 5becea4c [1]+ Done docker --version

Additional context
See similar issue here: triton-inference-server/server#3877

@rg314
Copy link
Author

rg314 commented Jun 3, 2022

To fix this issue I had to update my Nvidia drivers to 510.

Okay just for clarity...

Originally, my Nvidia drivers were running on an incompatible version for the triton server. To figure this out I just ran the Nvidia Triton image on docker:

docker run -it --gpus=all nvcr.io/nvidia/tritonserver:22.02-py3

If you get the following error:

This container was built for NVIDIA Driver Release 510.39 or later, but version 470.103.01 was detected and compatibility mode is UNAVAILABLE.

You'll need to update your Nvidia drivers.

To fix this issue I updated the drivers on my base OS i.e.

sudo apt install nvidia-driver-510 -y
sudo reboot

Then it worked. The docker-compose logs from clearml-serving-triton container did not make this clear (i.e. by running docker-compose -f docker/docker-compose-triton-gpu.yml logs -f) might be good to throw this as an error in the logs

@bmartinn
Copy link
Member

bmartinn commented Jun 7, 2022

Thanks @rg314 !
This is exactly the fix.
BTW, notice we just released v1.0.0, there is no need to change the Nvidia drivers (v510+), the Triton version is now 22.04,
but based Nvidia's release notes the next version of Triton (22.05) will need another driver bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants