Bug 1144223 - Make the job queue choose jobs fairly
Summary: Make the job queue choose jobs fairly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Bugzilla
Classification: Community
Component: Internal Tools
Version: 4.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: 4.4
Assignee: PnT DevOps Devs
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-19 03:50 UTC by Jason McDonald
Modified: 2018-12-09 06:29 UTC (History)
2 users (show)

Fixed In Version: 4.4.6026
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-27 02:19:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
CPAN 99075 0 None None None Never

Description Jason McDonald 2014-09-19 03:50:37 UTC
Description of problem:
So far we've had two incidents where the job queue got into a state where it wasn't keeping up with the inflow of Rules Engine jobs and we had to manually delete all jobs from the queue.

Diagnosis showed that the queue contained a large number of jobs that were blocked because the queue contained older jobs for the same bugs.  The job queue processor was stuck because it kept choosing the blocked jobs (and having to decline them) rather than choosing unblocked jobs.

This happens because the decision of which job to choose from the queue is amde only based on the priority of the job (useless, as all jobs currently have the same priority) and whether the job's grab_until and run_after times are in the past.  This effectively turns the job queue into a pool of jobs where most jobs are "runnable", and because of the way MySQL works, the same set of jobs keep getting selected (and declined), ad infinitum.

The solution:
The job queue needs to be an actual queue.  The oldest eligible jobs need to be processed first, and if a job needs to be declined it must go to the end of the queue, either by resetting the insert_time to the current time or discarding the job and creating a new one.  The former option may be easier, as jobs have other attributes that need to be preserved if the job is declined (e.g. the retry count).

Comment 3 Rony Gong 🔥 2014-10-10 12:00:59 UTC
Verified on QA environment(bzweb01-qe) with version(4.4.6026-4)
Result: Pass
1. Generate lots of jobs by execute the attachment
2. In the build that fixed this bug, I can find the bugzilla service consume the jobs by rate 1500/h, and almost can't see the  Declined job message in the log
3.  In the build that not fixed this bug, I can find the bugzilla service consume the jobs very slow, at last hang there, and could get lots of the  Declined job message in the log


Note You need to log in before you can comment on or make changes to this bug.