Bug 1944539 - [Regression] Task no longer cancel when restarting the pulp worker
Summary: [Regression] Task no longer cancel when restarting the pulp worker
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.8.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Vladimír Sedmík
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-30 06:50 UTC by Hao Chang Yu
Modified: 2022-10-10 13:41 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-10 13:41:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 8766 0 Normal MODIFIED After restarting services, task is being re-run even if it's in a complete state 2021-05-17 16:21:38 UTC

Description Hao Chang Yu 2021-03-30 06:50:18 UTC
Description of problem:
When stopping the worker, the running task state is marked as error (or any completed state) but the message of the task remains in the worker queue. After restarting the worker, the task will be re-run. Same thing happen again, if you try to restart the worker over and over.

I think this is the regression of this bug 1889795. Previously, the task state would leave as "unfinished" state when stopping the worker then the "delete_worker" method will cancel all the unfinished tasks.

Revert the Pulp server rpms fixed the issue.

Version-Release number of selected component (if applicable):
6.8.4


Steps to Reproduce:
1. Sync any repository and copy the task id (the initial 8 characters), eg. "b3bd24be"
2. While syncing the repository, stop the workers

systemctl stop pulp_workers

3. Check the /va/log/messages. you should see:

pulp: pulp.server.async.tasks:INFO: [30a62ff1] Task failed : [b3bd24be-b805-42b9-9f81-69eb4ddaad55] : Worker terminated abnormally while processing task b3bd24be-b805-42b9-9f81-69eb4ddaad55.  Check the logs for details
pulp: celery.app.trace:ERROR: [30a62ff1] (15661-20736) Task pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)

4. Then start the pulp_workers

systemctl start pulp_workers


5. Check the /var/log/messages again. You should see to same task (b3bd24be) runs again:

pulp: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.sync.sync[b3bd24be-b805-42b9-9f81-69eb4ddaad55]
pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[30a62ff1-7d72-497f-ad25-661119917bbb]
pulp: pulp.server.db.connection:INFO: Write concern for Mongo connection: {}
...
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Downloading metadata from https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/.
pulp: urllib3.connectionpool:INFO: Starting new HTTPS connection (1): cdn.redhat.com
pulp: nectar.downloaders.threaded:INFO: Download succeeded: https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml.
pulp: pulp_rpm.plugins.importers.yum.sync:INFO: [b3bd24be] Parsing metadata.

6. Repeat the steps again, the task will re-run again and again.


Actual results:
Worker re-run the stopped task.

Expected results:
Worker should cancel the task.

Comment 6 Pavel Moravec 2021-05-10 13:53:58 UTC
I can confirm Hao's finding.

Sat6.8.4 (pulp-server-2.21.3.3-1):
- stopping pulp_workers service puts the task to "error" state
- qpidd's queue contains the message with the task
- starting the service again starts the task again

Sat6.9.0 (pulp-server-2.21.5-2.el7sat.noarch):
- stopping pulp_workers service puts the task to "cancelled" state - THIS is the key difference
- qpidd's queue contains the message with the task
- starting the service again, nothing happens (I assume the worker fetches the message, checks the task was cancelled so it ignores it)


Prior Sat6.8.?? (pulp-server-2.21.3-1):
- stopping pulp_workers service puts the task to "cancelled" state as well.


So really, some change between 2.21.3-1 and 2.21.3.3-1 makes the difference that stopping pulp workers dont cancel the running task, but put them to error state.

Comment 8 pulp-infra@redhat.com 2021-05-17 09:09:29 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 9 pulp-infra@redhat.com 2021-05-17 09:09:30 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 10 Tanya Tereshchenko 2021-05-17 09:15:36 UTC
Hao, Pavel, please, give this patch a try https://patch-diff.githubusercontent.com/raw/pulp/pulp/pull/4023.patch

The code changes from BZ#1919405 handle the failure of the `on_failure` handler. It seems logical to put task in the error state in such case.
So with the patch provided, you'll still see the error state but the task should no longer be re-run.

Comment 12 pulp-infra@redhat.com 2021-05-17 16:21:39 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 13 pulp-infra@redhat.com 2021-05-17 17:40:15 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.


Note You need to log in before you can comment on or make changes to this bug.