Bug 1944539

Summary: [Regression] Task no longer cancel when restarting the pulp worker
Product: Red Hat Satellite Reporter: Hao Chang Yu <hyu>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED CURRENTRELEASE QA Contact: Vladimír Sedmík <vsedmik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.8.0CC: bmbouter, dalley, ggainey, jjansky, pmoravec, rchan, ttereshc
Target Milestone: UnspecifiedKeywords: Regression, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-10 13:41:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hao Chang Yu 2021-03-30 06:50:18 UTC
Description of problem:
When stopping the worker, the running task state is marked as error (or any completed state) but the message of the task remains in the worker queue. After restarting the worker, the task will be re-run. Same thing happen again, if you try to restart the worker over and over.

I think this is the regression of this bug 1889795. Previously, the task state would leave as "unfinished" state when stopping the worker then the "delete_worker" method will cancel all the unfinished tasks.

Revert the Pulp server rpms fixed the issue.

Version-Release number of selected component (if applicable):
6.8.4


Steps to Reproduce:
1. Sync any repository and copy the task id (the initial 8 characters), eg. "b3bd24be"
2. While syncing the repository, stop the workers

systemctl stop pulp_workers

3. Check the /va/log/messages. you should see:

pulp: pulp.server.async.tasks:INFO: [30a62ff1] Task failed : [b3bd24be-b805-42b9-9f81-69eb4ddaad55] : Worker terminated abnormally while processing task b3bd24be-b805-42b9-9f81-69eb4ddaad55.  Check the logs for details
pulp: celery.app.trace:ERROR: [30a62ff1] (15661-20736) Task pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)

4. Then start the pulp_workers

systemctl start pulp_workers


5. Check the /var/log/messages again. You should see to same task (b3bd24be) runs again:

pulp: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.sync.sync[b3bd24be-b805-42b9-9f81-69eb4ddaad55]
pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb]
pulp: pulp.server.db.connection:INFO: Write concern for Mongo connection: {}
...
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.

6. Repeat the steps again, the task will re-run again and again.


Actual results:
Worker re-run the stopped task.

Expected results:
Worker should cancel the task.

Comment 6 Pavel Moravec 2021-05-10 13:53:58 UTC
I can confirm Hao's finding.

Sat6.8.4 (pulp-server-2.21.3.3-1):
- stopping pulp_workers service puts the task to "error" state
- qpidd's queue contains the message with the task
- starting the service again starts the task again

Sat6.9.0 (pulp-server-2.21.5-2.el7sat.noarch):
- stopping pulp_workers service puts the task to "cancelled" state - THIS is the key difference
- qpidd's queue contains the message with the task
- starting the service again, nothing happens (I assume the worker fetches the message, checks the task was cancelled so it ignores it)


Prior Sat6.8.?? (pulp-server-2.21.3-1):
- stopping pulp_workers service puts the task to "cancelled" state as well.


So really, some change between 2.21.3-1 and 2.21.3.3-1 makes the difference that stopping pulp workers dont cancel the running task, but put them to error state.

Comment 8 pulp-infra@redhat.com 2021-05-17 09:09:29 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 9 pulp-infra@redhat.com 2021-05-17 09:09:30 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 10 Tanya Tereshchenko 2021-05-17 09:15:36 UTC
Hao, Pavel, please, give this patch a try https://patch-diff.githubusercontent.com/raw/pulp/pulp/pull/4023.patch

The code changes from BZ#1919405 handle the failure of the `on_failure` handler. It seems logical to put task in the error state in such case.
So with the patch provided, you'll still see the error state but the task should no longer be re-run.

Comment 12 pulp-infra@redhat.com 2021-05-17 16:21:39 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 13 pulp-infra@redhat.com 2021-05-17 17:40:15 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.