Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with stopping VMs #554

Merged
merged 4 commits into from
Mar 6, 2024
Merged

Fix issues with stopping VMs #554

merged 4 commits into from
Mar 6, 2024

Conversation

hoh
Copy link
Member

@hoh hoh commented Mar 5, 2024

  • Fix: A VM was only marked as stopping after all runs completed
  • Fix: Stopping executions could be returned

Waiting for all runs to complete before marking the run as stopping may cause this code to be executed simultaneously in multiple tasks and cause bugs related to the resources used by the VMs.

Solution: First set that the VM is stopping, then wait for all runs to complete.
@github-actions github-actions bot added the BLACK This PR has critical implications and must be reviewed by a senior engineer. label Mar 5, 2024
Calling `get_running_vm(...)` could return an execution that is in a stopping state, causing issues in the next steps of the process.
Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 23.07692% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 34.76%. Comparing base (2401133) to head (c1c438c).
Report is 2 commits behind head on main.

Files Patch % Lines
src/aleph/vm/orchestrator/run.py 0.00% 4 Missing ⚠️
src/aleph/vm/pool.py 20.00% 4 Missing ⚠️
src/aleph/vm/models.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #554      +/-   ##
==========================================
+ Coverage   34.74%   34.76%   +0.02%     
==========================================
  Files          52       52              
  Lines        4738     4741       +3     
  Branches      553      554       +1     
==========================================
+ Hits         1646     1648       +2     
- Misses       3074     3075       +1     
  Partials       18       18              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hoh added 2 commits March 5, 2024 18:27
Solution: Make `pool.get_running_vm` synchronous, remove the `async` before the definition of the function.
Solution: Reuse `pool.get_running_vm` that checks if the execution is being stopped.
@aleph-im aleph-im deleted a comment from github-actions bot Mar 6, 2024
@hoh
Copy link
Member Author

hoh commented Mar 6, 2024

Only CodeCov tests are failing

@hoh hoh merged commit d347031 into main Mar 6, 2024
19 of 20 checks passed
@hoh hoh deleted the hoh-fix-stopping-vm branch March 6, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BLACK This PR has critical implications and must be reviewed by a senior engineer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants