Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run on 4 cores and timeout after 12 hours #909

Closed
wants to merge 1 commit into from

Conversation

mikemhenry
Copy link
Contributor

Checklist

  • Added a news entry

Developers certificate of origin

@mikemhenry
Copy link
Contributor Author

Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.36%. Comparing base (dd0be23) to head (d185608).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #909      +/-   ##
==========================================
- Coverage   94.17%   92.36%   -1.81%     
==========================================
  Files         134      134              
  Lines        9800     9800              
==========================================
- Hits         9229     9052     -177     
- Misses        571      748     +177     
Flag Coverage Δ
fast-tests 92.36% <ø> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mikemhenry
Copy link
Contributor Author

This worked great! I just failed since it expected a GPU, so the nvidia-smi commands failed, = 2 failed, 855 passed, 40 skipped, 1 xfailed, 2 xpassed, 3594 warnings, 3 rerun in 4059.65s (1:07:39) = way faster using -n 4 than -n auto

@mikemhenry
Copy link
Contributor Author

(or did I mess up and make sure the log tests run?)

@IAlibay
Copy link
Member

IAlibay commented Jul 29, 2024

This worked great! I just failed since it expected a GPU, so the nvidia-smi commands failed, = 2 failed, 855 passed, 40 skipped, 1 xfailed, 2 xpassed, 3594 warnings, 3 rerun in 4059.65s (1:07:39) = way faster using -n 4 than -n auto

@mikemhenry the code in this PR was 4h not 1h (I think you're looking at my logs 😅).

Main difference is that I used a newer runner and removed coverage options (which we wouldn't use anyways).

@mikemhenry
Copy link
Contributor Author

That is exactly what happened, when we switch to GPUs we probably want the timeout to be 2x or 1.5x what the "normal" run length ends up being so we have a failsafe if things go loopy, but isn't too strict to allow for runtime fluctuations

@mikemhenry mikemhenry closed this Jul 31, 2024
@mikemhenry mikemhenry deleted the gpu-runner-tweaks branch October 4, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants