Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert when jobs fail #569

Merged
merged 31 commits into from
Oct 4, 2024
Merged

Alert when jobs fail #569

merged 31 commits into from
Oct 4, 2024

Conversation

Razzmatazzz
Copy link
Member

@Razzmatazzz Razzmatazzz commented Sep 18, 2024

Working on troubleshooting exactly what causes jobs to fail to start running.

So far, have determined that jobs fail to start because the job is already running. I believe there are two likely culprits for this.

First, it's possible queries to the DB are stalling out and the job waiting for those queries therefore finishes. Currently, I've mocked up and deployed a timeout mechanism to guard against this.

Second, it's possible that the Cloudflare KV put calls are stalling. There is currently no timeout set for them. if the DB timeout fix doesn't work, this will be tried next. Upon further reflection, I think this is the more likely culprit. We occasionally used to get job error warnings about the put requests failing, but we no longer get those. For mysterious Cloudflare reasons, I would guess the requests just remain pending forever rather than throwing errors.

@Razzmatazzz Razzmatazzz requested a review from a team as a code owner September 18, 2024 14:47
Copy link
Contributor

👋 Thanks for opening a pull request!

If you are new, please check out the trimmed down summary of our deployment process below:

  1. 👀 Observe the CI jobs and tests to ensure they are passing

  2. ✔️ Obtain an approval/review on this pull request

  3. 🚀 Branch deploy your pull request to production

    Comment .deploy on this pull request to trigger a deploy. If anything goes wrong, rollback with .deploy main

  4. 🎉 Merge!

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

src/tarkov-data-manager/jobs/index.mjs Dismissed Show dismissed Hide dismissed
src/tarkov-data-manager/jobs/index.mjs Dismissed Show dismissed Hide dismissed
src/tarkov-data-manager/jobs/index.mjs Dismissed Show dismissed Hide dismissed
@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

src/tarkov-data-manager/modules/preset-data.mjs Dismissed Show dismissed Hide dismissed
src/tarkov-data-manager/modules/preset-data.mjs Dismissed Show dismissed Hide dismissed
@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz
Copy link
Member Author

.deploy

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Triggered 🚀

Razzmatazzz, started a branch deployment to production

You can watch the progress here 🔗

Branch: alert-job-failed

Copy link
Contributor

github-actions bot commented Oct 3, 2024

Deployment Results ✅

Razzmatazzz successfully deployed branch alert-job-failed to production

@Razzmatazzz Razzmatazzz merged commit 39a9c7a into main Oct 4, 2024
5 checks passed
@Razzmatazzz Razzmatazzz deleted the alert-job-failed branch October 4, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants