Bug 1129758
| Summary: | When a worker dies or restarts the tasks assigned to it are not processed | ||
|---|---|---|---|
| Product: | [Retired] Pulp | Reporter: | Barnaby Court <bcourt> |
| Component: | async/tasks | Assignee: | Brian Bouterse <bmbouter> |
| Status: | CLOSED UPSTREAM | QA Contact: | pulp-qe-list |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.4.0 | CC: | bmbouter, ipanova, rbarlow, skarmark |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-02-28 22:15:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Barnaby Court
2014-08-13 15:26:22 UTC
I believe the reason other tasks are prevented from running has a lot more to do with the task status showing either in-progress or waiting than the fact that a reserved_resource reservation exists for the worker who restarted or dies. Given that, this bug implicitly described 2 behaviors: one that work that is submitted into Pulp gets "lost" when workers restart or die and two, that the status of those tasks is incorrect. The second behavior has been moved to a different BZ [0] altogether to be fixed in the short-term (ie: pulp 2.4.1). This BZ should focus on the first behavior only: that pulp looses work which it already knew about. Once that defect is fixed and pulp no longer looses work, it's important to undo the short-term fix put in place by [0]. Basically, once work does restart properly there should be no reason to proactively update those tasks as being cancelled. [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1129858 I was able to do some prototyping and made some improvements in the behaviors towards resolving this issue, but it is not finished. We'll need to move away from the dedicated queues feature of Celery, and use the CELERY_QUEUES feature instead to create similar auto-deleting, dedicated queues with the additional options to support the alternate-exchange. In addition to the implementation, it still needs a lot of testing on both Qpid and RabbitMQ before it's ready to be put into a PR or have tests written for it. I'm going to focus on some task/story work for a bit, but since I'm so far into this fix I'm leaving it in the assigned state. I filed an upstream bug [1] with Celery that the CELERY_WORKER_DIRECT can loose work. I think solving this in upstream celery is better than reworking Pulp to have a similar feature only with durability. https://github.com/celery/celery/issues/2492 Moved to https://pulp.plan.io/issues/489 |