Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab GPU Status Mismatch: Connected vs. Waiting #4768

Open
Vijayjangra21 opened this issue Aug 9, 2024 · 1 comment
Open

Colab GPU Status Mismatch: Connected vs. Waiting #4768

Vijayjangra21 opened this issue Aug 9, 2024 · 1 comment

Comments

@Vijayjangra21
Copy link

Vijayjangra21 commented Aug 9, 2024

Colab GPU Status Mismatch: Connected vs. Waiting

Description:
I encountered an issue in Google Colab where the interface shows conflicting information regarding the GPU status during model training. The interface indicates that a GPU is being utilized for training, as evidenced by GPU memory usage and processing details displayed in the output. However, the Colab status bar shows "Connecting" with an Green dot and a message at the bottom stating, "Waiting to finish the current execution," implying that the session is not fully connected to a GPU or is in a waiting state.

Steps to Reproduce:

  1. Start a new Google Colab session with GPU enabled.
  2. Begin training a deep learning model (e.g., YOLO) that utilizes GPU resources.
  3. Observe the GPU usage in the training output, confirming that GPU memory is being utilized.
  4. Note the status bar at the top of the interface, which inconsistently shows "Connecting" and the message "Waiting to
    finish the current execution" at the bottom.
  5. Restart the laptop during the session and return to Google Colab. Observe that the interface fails to correctly show the
    status or continue the process.

Expected Behavior:
The interface should correctly reflect the GPU status. If the GPU is being used, the status should show as "Connected" with a green dot, without indicating that the session is waiting to finish.

Observed Behavior:
The status bar shows conflicting information, suggesting that the GPU is not fully connected or the session is in a waiting state, despite GPU usage being displayed in the training logs.

Environment:

  • Google Colab (latest version as of August 9, 2024)
  • GPU enabled session
  • Training a model using PyTorch with CUDA enabled

Screenshot:
Attached is a screenshot showing the conflicting status messages during GPU usage.

Impact:
This issue creates confusion regarding the actual status of the GPU connection, leading to uncertainty about whether the training process is running correctly. It may also cause unnecessary interruptions if users believe that the session is not functioning properly.

Additional Notes:
This bug seems to be related to the UI's status display rather than the actual GPU functionality, as the training continues to utilize GPU resources despite the status mismatch.
Screenshot 2024-08-08 233042
Screenshot 2024-08-09 114814

@cperry-goog
Copy link

Thanks for the feedback, tracking this internally at b/361574572

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants