Bug 1400568

Summary: Canceled jobs get processed anyway
Product: [Community] Candlepin Reporter: Shayne Riley <sriley>
Component: candlepinAssignee: candlepin-bugs
Status: CLOSED WORKSFORME QA Contact: Katello QA List <katello-qa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.9.51CC: csnyder, khowell, redakkan, skallesh, sriley
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-08 16:26:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shayne Riley 2016-12-01 14:01:43 UTC
Description of problem:

If I cancel a job, I do not expect it to run later. However this is exactly what happens.

I discovered this after canceling several thousand jobs in the queue. When I came back later, I looked at the list of jobs and saw that the number of canceled jobs was less than it was previously. I looked up the status of a job I *knew* I canceled, and now it is listed as "FINISHED".


Version-Release number of selected component (if applicable):


How reproducible:

Always, though if your job queue is processed fast enough, it may be hard to cancel a job fast enough before it gets processed.


Steps to Reproduce:
1. Create a job, for example refresh an owner's pools. Make note of the job id.
2. Cancel the job before it gets processed using the job id. The job status should be "CANCELED".
3. Wait for the job queue to catch up. If you retrieve the job details at the right times, the job status will switch to "RUNNING" and then to "FINISHED".

Actual results:

A canceled job will be processed.


Expected results:

A canceled job doesn't get processed, and remains cancelled

Comment 1 Shayne Riley 2016-12-01 14:24:14 UTC
Hmm. On further evaluation, this is not as reproducible as I thought. Perhaps our Candlepin's quartz state was messed up enough that it was processing the canceled jobs.

As it stands, my canceled jobs are now staying canceled, which is good.

If the project maintainer wants to mark this bug as could not reproduce, I'm fine with that for now.

Comment 2 Chris Snyder 2016-12-05 15:20:49 UTC
Do you have any more details on what caused the strange quartz state in this instance? For example, was there heavy load on the candlepin instance? Perhaps there were a lot of jobs created and cancelled quickly?

Comment 3 Shayne Riley 2016-12-05 15:52:23 UTC
This came about because none of the tasks (or at least it seemed like it) were getting processed. Querying the database directly, we saw that several jobs were executing, but never seemed to get finished. Meanwhile, there were 6000+ jobs that were waiting to run, many that were over a week old.

Unfortunately it's a bit of a mystery as to why it got backed up the way it did. I have a (weak) theory that a few orgs (with many subscriptions and consumers) had refresh pools tasks that clogged up the works, but that's a shot in the dark.

What I ended up doing was canceling those several thousand jobs. Eventually, we reset tomcat so that we could enable some advanced logging on the app and in quartz, and then the queue started getting processed... including the canceled ones. Thankfully they all processed fast enough that the queue emptied in a matter of hours.

Comment 4 Kevin Howell 2016-12-08 15:23:03 UTC
I'm inclined to close this without a reproducer. With no easy way to reproduce, there's not much we can do here. I am curious though: which environment was this in (if it was a dev environment, I'm less concerned).

Comment 6 Kevin Howell 2016-12-08 16:26:39 UTC
Since this was a non-prod environment, I'm closing for now. Feel free to reopen if you see this again, or if you come up with a reproducer.