Description of problem: When stopping the worker, the running task state is marked as error (or any completed state) but the message of the task remains in the worker queue. After restarting the worker, the task will be re-run. Same thing happen again, if you try to restart the worker over and over. I think this is the regression of this bug 1889795. Previously, the task state would leave as "unfinished" state when stopping the worker then the "delete_worker" method will cancel all the unfinished tasks. Revert the Pulp server rpms fixed the issue. Version-Release number of selected component (if applicable): 6.8.4 Steps to Reproduce: 1. Sync any repository and copy the task id (the initial 8 characters), eg. "b3bd24be" 2. While syncing the repository, stop the workers systemctl stop pulp_workers 3. Check the /va/log/messages. you should see: pulp: pulp.server.async.tasks:INFO: [30a62ff1] Task failed : [b3bd24be-b805-42b9-9f81-69eb4ddaad55] : Worker terminated abnormally while processing task b3bd24be-b805-42b9-9f81-69eb4ddaad55. Check the logs for details pulp: celery.app.trace:ERROR: [30a62ff1] (15661-20736) Task pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",) 4. Then start the pulp_workers systemctl start pulp_workers 5. Check the /var/log/messages again. You should see to same task (b3bd24be) runs again: pulp: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.sync.sync[b3bd24be-b805-42b9-9f81-69eb4ddaad55] pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb] pulp: pulp.server.db.connection:INFO: Write concern for Mongo connection: {} ... pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/. pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml. pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata. pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/. pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml. pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata. pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/. pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml. pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata. pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/. pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml. pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata. 6. Repeat the steps again, the task will re-run again and again. Actual results: Worker re-run the stopped task. Expected results: Worker should cancel the task.
I can confirm Hao's finding. Sat6.8.4 (pulp-server-2.21.3.3-1): - stopping pulp_workers service puts the task to "error" state - qpidd's queue contains the message with the task - starting the service again starts the task again Sat6.9.0 (pulp-server-2.21.5-2.el7sat.noarch): - stopping pulp_workers service puts the task to "cancelled" state - THIS is the key difference - qpidd's queue contains the message with the task - starting the service again, nothing happens (I assume the worker fetches the message, checks the task was cancelled so it ignores it) Prior Sat6.8.?? (pulp-server-2.21.3-1): - stopping pulp_workers service puts the task to "cancelled" state as well. So really, some change between 2.21.3-1 and 2.21.3.3-1 makes the difference that stopping pulp workers dont cancel the running task, but put them to error state.
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.
Hao, Pavel, please, give this patch a try https://patch-diff.githubusercontent.com/raw/pulp/pulp/pull/4023.patch The code changes from BZ#1919405 handle the failure of the `on_failure` handler. It seems logical to put task in the error state in such case. So with the patch provided, you'll still see the error state but the task should no longer be re-run.
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.