Bug 1961779

Summary: Task still hangs after a celery worker process abruptly terminates
Product: Red Hat Satellite Reporter: Tanya Tereshchenko <ttereshc>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Lai <ltran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.9.0CC: ggainey, jjeffers, pcreech, rchan, ttereshc
Target Milestone: 6.9.3Keywords: Regression, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pulp-2.21.5.2-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1962815 (view as bug list) Environment:
Last Closed: 2021-07-01 14:56:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tanya Tereshchenko 2021-05-18 16:55:48 UTC
Description of problem:

Not fully fixed in BZ#1889795.
Requires this patch https://github.com/pulp/pulp/pull/4019/files.

Steps to Reproduce:
1. Invoke some bigger CV publish/promote (with more repos inside)
2. While there are pulp celery workers processing the sync/publish tasks, kill some of them via "kill -SIGHUP <pid>"
3. Check /var/log/messages
4. Check CV publish/promote task

The process hangs and never completes the publishing task. Here are the logs:


Feb 11 18:10:34 dhcp-2-174 pulp: py.warnings:WARNING: [ccf4ab5a] (33775-20800)   "MongoClient opened before fork. Create MongoClient "
Feb 11 18:10:34 dhcp-2-174 pulp: py.warnings:WARNING: [ccf4ab5a] (33775-20800)
Feb 11 18:10:34 dhcp-2-174 pulp: pulp.server.async.tasks:INFO: [ccf4ab5a] Task failed : [18ac52ec-9f3e-47bc-b34b-98734bee3656] : Worker terminated abnormally while processing task 18ac52ec-9f3e-47bc-b34
b-98734bee3656.  Check the logs for details
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800) Task pulp.server.async.tasks._release_resource[ccf4ab5a-af22-4d41-84c9-255085b7eded] raised unexpected: UnboundLocalErro
r("local variable 'original_formatted_traceback' referenced before assignment",)
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800) Traceback (most recent call last):
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)     R = retval = fun(*args, **kwargs)
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 108, in __call__
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)     return super(PulpTask, self).__call__(*args, **kwargs)
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)     return self.run(*args, **kwargs)
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 376, in _release_resource
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)     new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 779, in on_failure
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800)     _logger.debug(original_formatted_traceback)
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:ERROR: [ccf4ab5a] (33775-20800) UnboundLocalError: local variable 'original_formatted_traceback' referenced before assignment
Feb 11 18:10:34 dhcp-2-174 pulp: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[308a2cd5-f900-434c-8a02-d2aeb3e86992]
Feb 11 18:10:34 dhcp-2-174 pulp: celery.app.trace:INFO: [fe61a18e] Task pulp.server.managers.repo.unit_association.associate_from_repo[fe61a18e-5a65-4a40-a282-eb614e5e64ef] succeeded in 0.0295602829992s
: {'units_successful': [], 'units_failed_signature_filter': []}


Version-Release number of selected component (if applicable):
pulp-server-2.21.5-2.el7sat.noarch

Comment 2 pulp-infra@redhat.com 2021-05-18 17:41:11 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 3 pulp-infra@redhat.com 2021-05-18 17:41:12 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 4 pulp-infra@redhat.com 2021-05-18 18:27:08 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 6 Brad Buckingham 2021-06-04 17:21:37 UTC
*** Bug 1962815 has been marked as a duplicate of this bug. ***

Comment 7 Lai 2021-06-07 18:44:13 UTC
Steps to test:

1. Invoke some bigger CV publish/promote (with more repos inside)
2. While there are pulp celery workers processing the sync/publish tasks, kill some of them via "kill -SIGHUP <pid>"
3. Check /var/log/messages
4. Verify in the web UI that the foreman task errors out
5. Verify that the task can be resumed and completes successfully

Expected result:
3. log should not show error message or traceback
4. Foreman task should show task error out
5. Task should complete successfull

Actual Result:
3. Log still shows traceback, but does not affect anything.
4. Task does show that it errors out as expected.
5. Task completes successfully

Verified on 6.9.3 with pulp-server-2.21.5.2-1.el7sat.noarch

Comment 13 errata-xmlrpc 2021-07-01 14:56:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.9.3 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2636