You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are already 2 running jobs in the pool. Kade gets a request from Hamlet to PredictionJobsController#create. The controller kicks off a PredictionJobSubmissionJob. When Batch::Prediction::CreateJob.new(prediction_job).run is called, the bajor_client reaches out to bajor and receives a 409 Conflict because bajor will not schedule another job, because "Active Jobs are running in the batch system - please wait till they are fininshed processing." The PredictionJobsSubmissionJob raises an exception and Sidekiq requeues the job.
When the job re-runs, the job status is checked--but only for submitted? and complete?. Since the previous job failed, the worker continues to run. The failure of the worker should cause it to retry, passing the same argument (prediction job ID).
Instead, for some reason it begins creating new PredictionJobs in kade as the retries are run. Each new PredictionJob is retried individually in Sidekiq, and so a single failure causes multiple jobs to be queued in the sidekiq retry queue multiple times, each with incrementing prediction job IDs.
The text was updated successfully, but these errors were encountered:
There are already 2 running jobs in the pool. Kade gets a request from Hamlet to PredictionJobsController#create. The controller kicks off a PredictionJobSubmissionJob. When
Batch::Prediction::CreateJob.new(prediction_job).run
is called, the bajor_client reaches out to bajor and receives a409 Conflict
because bajor will not schedule another job, because"Active Jobs are running in the batch system - please wait till they are fininshed processing."
The PredictionJobsSubmissionJob raises an exception and Sidekiq requeues the job.When the job re-runs, the job status is checked--but only for
submitted?
andcomplete?
. Since the previous job failed, the worker continues to run. The failure of the worker should cause it to retry, passing the same argument (prediction job ID).Instead, for some reason it begins creating new PredictionJobs in kade as the retries are run. Each new PredictionJob is retried individually in Sidekiq, and so a single failure causes multiple jobs to be queued in the sidekiq retry queue multiple times, each with incrementing prediction job IDs.
The text was updated successfully, but these errors were encountered: