Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
When stopping the worker, the running task state is marked as error (or any completed state) but the message of the task remains in the worker queue. After restarting the worker, the task will be re-run. Same thing happen again, if you try to restart the worker over and over.
I think this is the regression of this bug 1889795. Previously, the task state would leave as "unfinished" state when stopping the worker then the "delete_worker" method will cancel all the unfinished tasks.
Revert the Pulp server rpms fixed the issue.
Version-Release number of selected component (if applicable):
6.8.4
Steps to Reproduce:
1. Sync any repository and copy the task id (the initial 8 characters), eg. "b3bd24be"
2. While syncing the repository, stop the workers
systemctl stop pulp_workers
3. Check the /va/log/messages. you should see:
pulp: pulp.server.async.tasks:INFO: [30a62ff1] Task failed : [b3bd24be-b805-42b9-9f81-69eb4ddaad55] : Worker terminated abnormally while processing task b3bd24be-b805-42b9-9f81-69eb4ddaad55. Check the logs for details
pulp: celery.app.trace:ERROR: [30a62ff1] (15661-20736) Task pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
4. Then start the pulp_workers
systemctl start pulp_workers
5. Check the /var/log/messages again. You should see to same task (b3bd24be) runs again:
pulp: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.sync.sync[b3bd24be-b805-42b9-9f81-69eb4ddaad55]
pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb]
pulp: pulp.server.db.connection:INFO: Write concern for Mongo connection: {}
...
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
6. Repeat the steps again, the task will re-run again and again.
Actual results:
Worker re-run the stopped task.
Expected results:
Worker should cancel the task.
I can confirm Hao's finding.
Sat6.8.4 (pulp-server-2.21.3.3-1):
- stopping pulp_workers service puts the task to "error" state
- qpidd's queue contains the message with the task
- starting the service again starts the task again
Sat6.9.0 (pulp-server-2.21.5-2.el7sat.noarch):
- stopping pulp_workers service puts the task to "cancelled" state - THIS is the key difference
- qpidd's queue contains the message with the task
- starting the service again, nothing happens (I assume the worker fetches the message, checks the task was cancelled so it ignores it)
Prior Sat6.8.?? (pulp-server-2.21.3-1):
- stopping pulp_workers service puts the task to "cancelled" state as well.
So really, some change between 2.21.3-1 and 2.21.3.3-1 makes the difference that stopping pulp workers dont cancel the running task, but put them to error state.
Comment 8pulp-infra@redhat.com
2021-05-17 09:09:29 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.
Comment 9pulp-infra@redhat.com
2021-05-17 09:09:30 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.
Comment 10Tanya Tereshchenko
2021-05-17 09:15:36 UTC
Hao, Pavel, please, give this patch a try https://patch-diff.githubusercontent.com/raw/pulp/pulp/pull/4023.patch
The code changes from BZ#1919405 handle the failure of the `on_failure` handler. It seems logical to put task in the error state in such case.
So with the patch provided, you'll still see the error state but the task should no longer be re-run.
Comment 12pulp-infra@redhat.com
2021-05-17 16:21:39 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.
Comment 13pulp-infra@redhat.com
2021-05-17 17:40:15 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.