| Summary: | Canceled jobs get processed anyway | ||
|---|---|---|---|
| Product: | [Community] Candlepin | Reporter: | Shayne Riley <sriley> |
| Component: | candlepin | Assignee: | candlepin-bugs |
| Status: | CLOSED WORKSFORME | QA Contact: | Katello QA List <katello-qa-list> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 0.9.51 | CC: | csnyder, khowell, redakkan, skallesh, sriley |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-12-08 16:26:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Shayne Riley
2016-12-01 14:01:43 UTC
Hmm. On further evaluation, this is not as reproducible as I thought. Perhaps our Candlepin's quartz state was messed up enough that it was processing the canceled jobs. As it stands, my canceled jobs are now staying canceled, which is good. If the project maintainer wants to mark this bug as could not reproduce, I'm fine with that for now. Do you have any more details on what caused the strange quartz state in this instance? For example, was there heavy load on the candlepin instance? Perhaps there were a lot of jobs created and cancelled quickly? This came about because none of the tasks (or at least it seemed like it) were getting processed. Querying the database directly, we saw that several jobs were executing, but never seemed to get finished. Meanwhile, there were 6000+ jobs that were waiting to run, many that were over a week old. Unfortunately it's a bit of a mystery as to why it got backed up the way it did. I have a (weak) theory that a few orgs (with many subscriptions and consumers) had refresh pools tasks that clogged up the works, but that's a shot in the dark. What I ended up doing was canceling those several thousand jobs. Eventually, we reset tomcat so that we could enable some advanced logging on the app and in quartz, and then the queue started getting processed... including the canceled ones. Thankfully they all processed fast enough that the queue emptied in a matter of hours. I'm inclined to close this without a reproducer. With no easy way to reproduce, there's not much we can do here. I am curious though: which environment was this in (if it was a dev environment, I'm less concerned). Since this was a non-prod environment, I'm closing for now. Feel free to reopen if you see this again, or if you come up with a reproducer. |