Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1339288

Summary: pulp celery worker unhandled exception duiring startup when cancelling a task
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED WONTFIX QA Contact: jcallaha
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1.8CC: bkearney, bmbouter, cwelton, daviddavis, dkliban, ggainey, ipanova, ktordeur, mhrivnak, pcreech, rchan, ttereshc
Target Milestone: UnspecifiedKeywords: Reopened, Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 18:04:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2016-05-24 14:52:47 UTC
Description of problem:
Under an unknown circumstances, restarting katello-service ended up with _all_ pulp celery workers shut down due to unhandled exception.

Backtrace (might not be 100% correct since it is selected from interleaving 3 backtraces):

May 10 09:57:59 mysatellite celery: Traceback (most recent call last):
May 10 09:57:59 mysatellite celery: File "/usr/bin/celery", line 9, in <module>
May 10 09:57:59 mysatellite celery: load_entry_point('celery==3.1.11', 'console_scripts', 'celery')()
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/__main__.py", line 30, in main
May 10 09:57:59 mysatellite celery: main()
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 81, in main
May 10 09:57:59 mysatellite celery: cmd.execute_from_commandline(argv)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 769, in execute_from_commandline
May 10 09:57:59 mysatellite celery: super(CeleryCommand, self).execute_from_commandline(argv)))
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/base.py", line 306, in execute_from_commandline
May 10 09:57:59 mysatellite celery: return self.handle_argv(self.prog_name, argv[1:])
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 761, in handle_argv
May 10 09:57:59 mysatellite celery: return self.execute(command, argv)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/celery.py", line 693, in execute
May 10 09:57:59 mysatellite celery: ).run_from_argv(self.prog_name, argv[1:], command=argv[0])
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/worker.py", line 179, in run_from_argv
May 10 09:57:59 mysatellite celery: return self(*args, **options)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/base.py", line 269, in __call__
May 10 09:57:59 mysatellite celery: ret = self.run(*args, **kwargs)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/bin/worker.py", line 212, in run
May 10 09:57:59 mysatellite celery: state_db=self.node_format(state_db, hostname), **kwargs
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 100, in __init__
May 10 09:57:59 mysatellite celery: self.setup_instance(**self.prepare_args(**kwargs))
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 131, in setup_instance
May 10 09:57:59 mysatellite celery: signals.worker_init.send(sender=self)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
May 10 09:57:59 mysatellite celery: response = receiver(signal=self, sender=sender, **named)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 521, in cleanup_old_worker
May 10 09:57:59 mysatellite celery: _delete_worker(name, normal_shutdown=True)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 124, in _delete_worker
May 10 09:57:59 mysatellite celery: cancel(task['task_id'])
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 428, in cancel
May 10 09:57:59 mysatellite celery: controller.revoke(task_id, terminate=True)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 171, in revoke
May 10 09:57:59 mysatellite celery: 'signal': signal}, **kwargs)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 306, in broadcast
May 10 09:57:59 mysatellite celery: limit, callback, channel=channel,
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
May 10 09:57:59 mysatellite celery: chan = channel or self.connection.default_channel
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
May 10 09:57:59 mysatellite celery: self.connection
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
May 10 09:57:59 mysatellite celery: self._connection = self._establish_connection()
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
May 10 09:57:59 mysatellite celery: conn = self.transport.establish_connection()
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1701, in establish_connection
May 10 09:57:59 mysatellite celery: conn = self.Connection(**opts)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1309, in __init__
May 10 09:57:59 mysatellite celery: self._qpid_conn = establish(**self.connection_options)
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 68, in establish
May 10 09:57:59 mysatellite celery: conn.open(timeout=timeout)
May 10 09:57:59 mysatellite celery: File "<string>", line 6, in open
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 279, in open
May 10 09:57:59 mysatellite celery: self.attach(timeout=timeout)
May 10 09:57:59 mysatellite celery: File "<string>", line 6, in attach
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 297, in attach
May 10 09:57:59 mysatellite celery: if not self._ewait(lambda: self._transport_connected and not self._unlinked(), timeout=timeout):
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 230, in _ewait
May 10 09:57:59 mysatellite celery: self.check_error()
May 10 09:57:59 mysatellite celery: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 223, in check_error
May 10 09:57:59 mysatellite celery: raise e
May 10 09:57:59 mysatellite celery: qpid.messaging.exceptions.ConnectError: [Errno 111] Connection refused

Usual connection attempt failures to the qpid broker are caught properly:

May 10 09:57:56 mysatellite pulp: celery.beat:ERROR: beat: Connection error: [Errno 111] Connection refused. Trying again in 4.0 seconds...

but the above one not.


Version-Release number of selected component (if applicable):
pulp-server-2.6.0.20-1.el7sat.noarch


How reproducible:
??? I guess 100% when reproducer known


Steps to Reproduce:
1. Enforce pulp to cancel a task just after startup
2. Shut down qpidd and pulp
3. Start pulp


Actual results:
Above backtrace causes all pulp_worker-*.service to fail, i.e. no worker after all.


Expected results:
Workers should catch the exception and retry after a while. (or they should try cancelling a task even after a connection to qpid is made)


Additional info:

Comment 2 Bryan Kearney 2016-07-08 20:23:15 UTC
Per 6.3 planning, moving out non acked bugs to the backlog

Comment 5 Brian Bouterse 2016-08-08 16:15:44 UTC
I tried 4 times to reproduce this, but I couldn't. Note this was reported against Pulp 2.6 and I tested against 2.9. Also note that I'm on fedora 24 and satellite's Qpid packages are different based on MRG when upstream is based on Qpid directly.

In my dev environment I ran:

# sync the zoo repo and kill -9 all workers just after it starts
pulp-admin rpm repo sync run --repo-id zoo
sudo pkill -9 -f celery

# verify the task is at running and get the task id as <task_id>
pulp-admin task list

# stop pulp and qpidd
pstop
sudo systemctl stop qpidd

# start all pulp service and immediately after issue the cancel
pstart
pulp-admin tasks cancel --task-id <task_id>

# Observe that the cancel halts waiting on the message bus
# Observe the expected wait-and-retry with backoff behavior from all pulp processes

# Bring qpidd back online
sudo systemctl start qpidd

# Observe all pulp services connect as expected and the task cancellation unblocks and returns
# Observe that no unexpected tracebacks are in the logs

Comment 6 Brian Bouterse 2016-08-08 16:19:33 UTC
In the past month several people have observed unexpected tracebacks raised by qpid.messaging related to connection issues and errors. Also there have been some deadlocking issues [0].

I think this issue needs to be reproduced or re-observed against Sat 6.2.z which is based on Pulp 2.8.z If it does reproduce we need to get assistance and support from Qpid team to troubleshoot it. I don't think there is likely an adjustment to Pulp which will improve this issue.

[0]: https://issues.apache.org/jira/browse/QPID-7317

Comment 7 Brad Buckingham 2016-08-08 16:40:33 UTC
Brian, thank you for the investigation and feedback.

Based on the comments above, I am going to recommend that we move this one to ON_QA for QE verification.

Comment 9 Corey Welton 2016-09-30 20:22:41 UTC
qa -> jcalla

Comment 10 jcallaha 2016-09-30 20:58:00 UTC
I am unable to reproduce this issue in Satellite 6.2.2 Async Snap 2.

Kicked off a repo sync, then stopped pulp services 
for i in "pulp_celerybeat pulp_resource_manager pulp_workers"; do systemctl stop $i; done
stopped qpidd
systemctl stop qpidd

Then re-enabled the pulp services
for i in "pulp_celerybeat pulp_resource_manager pulp_workers"; do systemctl start $i; done

and cancelled the job in UI. The task stopped successfully without needing qpidd to be restarted. restarted qpidd and performed a half dozen variations of syn'ing a repo while restarting services and cancelling the task. No issues encountered with pulp services starting back up.

Comment 11 Pavel Moravec 2017-02-08 18:31:40 UTC
Reopening since hit by a customer on

pulp-server-2.8.7.3-1.el7sat.noarch

I expect no change fixed this in the very latest pulp-server-2.8.7.5-1.el7sat.noarch either.

Current backtrace:

Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232) Unrecoverable error: ConnectError('[Errno 111] Connection refused',)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232) Traceback (most recent call last):
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.blueprint.start(self)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.on_start()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 157, in on_start
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     sender=self.hostname, instance=self, conf=self.app.conf,
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     response = receiver(signal=self, sender=sender, **named)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/pulp/server/async/app.py", line 52, in initialize_worker
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     tasks._delete_worker(sender, normal_shutdown=True)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 258, in _delete_worker
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     cancel(task_status['task_id'])
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 591, in cancel
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     controller.revoke(task_id, terminate=True)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 171, in revoke
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     'signal': signal}, **kwargs)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 306, in broadcast
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     limit, callback, channel=channel,
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     chan = channel or self.connection.default_channel
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self._connection = self._establish_connection()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     conn = self.transport.establish_connection()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1600, in establish_connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     conn = self.Connection(**opts)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1261, in __init__
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self._qpid_conn = establish(**self.connection_options)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 112, in establish
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     conn.open(timeout=timeout)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "<string>", line 6, in open
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 323, in open
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.attach(timeout=timeout)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "<string>", line 6, in attach
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 341, in attach
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     if not self._ewait(lambda: self._transport_connected and not self._unlinked(), timeout=timeout):
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 274, in _ewait
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.check_error()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 267, in check_error
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     raise e
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232) ConnectError: [Errno 111] Connection refused


Anyway, qpidd broker was running since:

Feb 05 00:02:08 satellite systemd[1]: Started An AMQP message broker daemon..


It might appear after machine reboot when pulp tasks cancellation has not been processed within pulp properly, yet. That differs from the way QE tried to reproduce.

Need to find out a reliable reproducer, though..

Comment 12 Kenny Tordeurs 2017-02-09 10:42:51 UTC
Hi, 

Was able to reproduce this as follows:

- Before doing anything the current workers on the Satellite:

[root@ktordeur-sat61 ~]# mongo
MongoDB shell version: 2.6.11
connecting to: test
Server has startup warnings: 
2017-01-20T11:32:35.350+0100 [initandlisten] 
2017-01-20T11:32:35.350+0100 [initandlisten] ** WARNING: Readahead for /var/lib/mongodb is set to 4096KB
2017-01-20T11:32:35.350+0100 [initandlisten] **          We suggest setting it to 256KB (512 sectors) or less
2017-01-20T11:32:35.350+0100 [initandlisten] **          http://dochub.mongodb.org/core/readahead
> use pulp_database
switched to db pulp_database
> db.workers.find()
{ "_id" : "scheduler.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:22:48.320Z") }
{ "_id" : "resource_manager.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:23:18.102Z") }
{ "_id" : "reserved_resource_worker-2.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:23:18.414Z") }
{ "_id" : "reserved_resource_worker-0.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:23:24.513Z") }
{ "_id" : "reserved_resource_worker-3.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:23:17.931Z") }
{ "_id" : "reserved_resource_worker-1.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:23:18.695Z") }
> 

- Start sync of ~ 8 repositories
- Reboot the server after starting the sync

- Check workers after reboot Satellite:

Last login: Thu Feb  9 09:29:19 2017 from ovpn-204-128.brq.redhat.com
[root@ktordeur-sat61 ~]# mongo
MongoDB shell version: 2.6.11
connecting to: test
> use pulp_database
switched to db pulp_database
> db.workers.find()
{ "_id" : "scheduler.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:33:45.014Z") }
{ "_id" : "resource_manager.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:34:49.869Z") }
> 


Checking logs: 
~~~
Feb  9 10:24:24 ktordeur-sat61 pulp: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[6d640909-13e3-458f-931e-69694bd08bfe] succeeded in 0.0527766929008s: None
Feb  9 11:53:57 ktordeur-sat61 systemd: Started Pulp's Celerybeat.
Feb  9 11:53:57 ktordeur-sat61 systemd: Starting Pulp's Celerybeat...
Feb  9 11:53:57 ktordeur-sat61 systemd: Starting Pulp Celery Workers...
Feb  9 11:53:58 ktordeur-sat61 systemd: Started Pulp Celery Workers.
Feb  9 10:33:42 ktordeur-sat61 pulp: celery.beat:INFO: beat: Starting...
Feb  9 10:33:45 ktordeur-sat61 pulp: celery.beat:ERROR: beat: Connection error: [Errno 111] Connection refused. Trying again in 2.0 seconds...
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808) Unrecoverable error: ConnectError('[Errno 111] Connection refused',)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808) Traceback (most recent call last):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888) Unrecoverable error: ConnectError('[Errno 111] Connection refused',)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888) Traceback (most recent call last):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     self.blueprint.start(self)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     self.blueprint.start(self)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     self.on_start()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     self.on_start()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 157, in on_start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 157, in on_start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     sender=self.hostname, instance=self, conf=self.app.conf,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     response = receiver(signal=self, sender=sender, **named)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     sender=self.hostname, instance=self, conf=self.app.conf,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/pulp/server/async/app.py", line 52, in initialize_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     response = receiver(signal=self, sender=sender, **named)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     tasks._delete_worker(sender, normal_shutdown=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/pulp/server/async/app.py", line 52, in initialize_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 258, in _delete_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     tasks._delete_worker(sender, normal_shutdown=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     cancel(task_status['task_id'])
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 258, in _delete_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 591, in cancel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     cancel(task_status['task_id'])
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     controller.revoke(task_id, terminate=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 171, in revoke
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     'signal': signal}, **kwargs)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 591, in cancel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 306, in broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     controller.revoke(task_id, terminate=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 171, in revoke
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     limit, callback, channel=channel,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     'signal': signal}, **kwargs)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 306, in broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     limit, callback, channel=channel,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     chan = channel or self.connection.default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     chan = channel or self.connection.default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     self.connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     self.connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     self._connection = self._establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     self._connection = self._establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     conn = self.transport.establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     conn = self.transport.establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1600, in establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1600, in establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     conn = self.Connection(**opts)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     conn = self.Connection(**opts)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1261, in __init__
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     self._qpid_conn = establish(**self.connection_options)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1261, in __init__
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 112, in establish
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     self._qpid_conn = establish(**self.connection_options)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     conn.open(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 112, in establish
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "<string>", line 6, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     conn.open(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 323, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "<string>", line 6, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     self.attach(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 323, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "<string>", line 6, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 341, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     self.attach(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     if not self._ewait(lambda: self._transport_connected and not self._unlinked(), timeout=timeout):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "<string>", line 6, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 274, in _ewait
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 341, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     self.check_error()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     if not self._ewait(lambda: self._transport_connected and not self._unlinked(), timeout=timeout):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 274, in _ewait
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 267, in check_error
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808)     raise e
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     self.check_error()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1115-43808) ConnectError: [Errno 111] Connection refused
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 267, in check_error
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888)     raise e
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1113-05888) ConnectError: [Errno 111] Connection refused
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768) Unrecoverable error: ConnectError('[Errno 111] Connection refused',)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768) Traceback (most recent call last):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     self.blueprint.start(self)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     self.on_start()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 157, in on_start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     sender=self.hostname, instance=self, conf=self.app.conf,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     response = receiver(signal=self, sender=sender, **named)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/pulp/server/async/app.py", line 52, in initialize_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     tasks._delete_worker(sender, normal_shutdown=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 258, in _delete_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     cancel(task_status['task_id'])
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 591, in cancel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     controller.revoke(task_id, terminate=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 171, in revoke
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     'signal': signal}, **kwargs)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 306, in broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     limit, callback, channel=channel,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     chan = channel or self.connection.default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     self.connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     self._connection = self._establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     conn = self.transport.establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1600, in establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     conn = self.Connection(**opts)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1261, in __init__
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     self._qpid_conn = establish(**self.connection_options)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 112, in establish
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     conn.open(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "<string>", line 6, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 323, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     self.attach(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "<string>", line 6, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 341, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     if not self._ewait(lambda: self._transport_connected and not self._unlinked(), timeout=timeout):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 274, in _ewait
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     self.check_error()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 267, in check_error
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768)     raise e
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1119-28768) ConnectError: [Errno 111] Connection refused
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.beat:ERROR: beat: Connection error: [Errno 111] Connection refused. Trying again in 4.0 seconds...
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848) Unrecoverable error: ConnectError('[Errno 111] Connection refused',)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848) Traceback (most recent call last):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     self.blueprint.start(self)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     self.on_start()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 157, in on_start
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     sender=self.hostname, instance=self, conf=self.app.conf,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     response = receiver(signal=self, sender=sender, **named)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/pulp/server/async/app.py", line 52, in initialize_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     tasks._delete_worker(sender, normal_shutdown=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 258, in _delete_worker
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     cancel(task_status['task_id'])
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 591, in cancel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     controller.revoke(task_id, terminate=True)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 171, in revoke
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     'signal': signal}, **kwargs)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 306, in broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     limit, callback, channel=channel,
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     chan = channel or self.connection.default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     self.connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     self._connection = self._establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     conn = self.transport.establish_connection()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1600, in establish_connection
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     conn = self.Connection(**opts)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1261, in __init__
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     self._qpid_conn = establish(**self.connection_options)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 112, in establish
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     conn.open(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "<string>", line 6, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 323, in open
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     self.attach(timeout=timeout)
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "<string>", line 6, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 341, in attach
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     if not self._ewait(lambda: self._transport_connected and not self._unlinked(), timeout=timeout):
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 274, in _ewait
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     self.check_error()
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 267, in check_error
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848)     raise e
Feb  9 10:33:47 ktordeur-sat61 pulp: celery.worker:ERROR: (1117-18848) ConnectError: [Errno 111] Connection refused
Feb  9 10:33:49 ktordeur-sat61 celery: -------------- resource_manager.redhat.com v3.1.11 (Cipater)
Feb  9 10:33:49 ktordeur-sat61 celery: ---- **** -----
Feb  9 10:33:49 ktordeur-sat61 celery: --- * ***  * -- Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-redhat-7.3-Maipo
Feb  9 10:33:49 ktordeur-sat61 celery: -- * - **** ---
Feb  9 10:33:49 ktordeur-sat61 celery: - ** ---------- [config]
Feb  9 10:33:49 ktordeur-sat61 celery: - ** ---------- .> app:         tasks:0x36e6490
Feb  9 10:33:49 ktordeur-sat61 celery: - ** ---------- .> transport:   qpid://ktordeur-sat61.usersys.redhat.com:5671//
Feb  9 10:33:49 ktordeur-sat61 celery: - ** ---------- .> results:     disabled
Feb  9 10:33:49 ktordeur-sat61 celery: - *** --- * --- .> concurrency: 1 (prefork)
Feb  9 10:33:49 ktordeur-sat61 celery: -- ******* ----
Feb  9 10:33:49 ktordeur-sat61 celery: --- ***** ----- [queues]
Feb  9 10:33:49 ktordeur-sat61 celery: -------------- .> resource_manager exchange=resource_manager(direct) key=resource_manager
Feb  9 10:33:49 ktordeur-sat61 celery: .> resource_manager.redhat.com.dq exchange=C.dq(direct) key=resource_manager.redhat.com
Feb  9 10:33:49 ktordeur-sat61 pulp: celery.worker.consumer:INFO: Connected to qpid://ktordeur-sat61.usersys.redhat.com:5671//
~~~

- Restart pulp services: # for i in pulp_resource_manager pulp_workers pulp_celerybeat; do service $i restart; done

Workers are back:
~~~
[root@ktordeur-sat61 ~]# mongo
MongoDB shell version: 2.6.11
connecting to: test
> use pulp_database
switched to db pulp_database
> db.workers.find()
{ "_id" : "scheduler.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:37:46.517Z") }
{ "_id" : "resource_manager.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:38:16.363Z") }
{ "_id" : "reserved_resource_worker-3.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:38:16.527Z") }
{ "_id" : "reserved_resource_worker-1.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:38:16.535Z") }
{ "_id" : "reserved_resource_worker-2.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:38:16.666Z") }
{ "_id" : "reserved_resource_worker-0.redhat.com", "last_heartbeat" : ISODate("2017-02-09T09:38:16.683Z") }
> 
~~~

Comment 13 Michael Hrivnak 2017-02-13 12:50:05 UTC
On first glance it appears that the worker is starting before qpid is available, and when it tries to do cleanup, that fails.

Brian, what do you think? Presumably the worker should wait during startup for a broker to become available before trying to use it, or be able to retry.

Comment 14 Brian Bouterse 2017-03-08 12:49:09 UTC
The traceback from Comment 12 suggests the following is happening:

1. Pulp is healthy
2. Pulp is given work to do (8 syncs)
3. The server is restarted.
4. Upon restart, the workers start before qpidd
5. The workers try to cancel the tasks that were "lost" due to upstream bug 489 [0].
6. The workers experience an 'Unrecoverable Error' and exit
7. Only the resource manager and celerybeat correctly wait for the broker to become available.

What is strange is that Comment 5 is very similar to this process which I could not reproduce with. I will try once again to reproduce this on upstream Pulp.

[0]: https://pulp.plan.io/issues/489

Comment 15 Michael Hrivnak 2017-03-16 17:28:30 UTC
Brian, do you have an update on this? Maybe it's worth trying to reproduce on a Satellite box. Thanks!

Comment 16 Brian Bouterse 2017-03-24 15:15:16 UTC
I tried to reproduce it on upstream but I could not. Where can I get a satellite system that I can stop/start in such an intrusive way?

Comment 17 Brian Bouterse 2017-03-24 15:23:47 UTC
I'm going to try to reproduce on a Sat 6.2.8 latest EL7 vagrant box.

Comment 18 Brian Bouterse 2017-03-24 17:24:12 UTC
I was able to reproduce this on a satellite 6.2.8 rhel7 box. To reproduce I:

Created a 'zoo' product that sync from the zoo fixture repo https://repos.fedorapeople.org/repos/pulp/pulp/fixtures/rpm/

Ensure that pulp was using 1 worker in /etc/default/pulp_workers. Sync 'zoo' manually once to see it working. Sync it again, and just when the task begins run:

sudo pkill -9 -f reserved_resource_worker

Then stop qpidd with `sudo systemctl stop qpidd`

Then restart the pulp_workers `sudo systemctl restart pulp_workers`

# Observe the traceback in question

Then start qpidd again `sudo systemctl start qpidd`

Trigger a new manual sync and observe that it does not complete. I will try once again to reproduce on upstream.

Comment 19 Brian Bouterse 2017-03-24 17:28:07 UTC
Using Comment 19 I was able to reproduce the issue in upstream also. I will file an upstream bug.

Comment 20 pulp-infra@redhat.com 2017-03-24 18:02:30 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 21 pulp-infra@redhat.com 2017-03-24 18:02:33 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 22 pulp-infra@redhat.com 2017-06-15 14:32:23 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 23 Brian Bouterse 2017-06-28 17:25:41 UTC
After some upstream investigation, I think this bug should have a priority of med and a severity of low. A traceback emitted on startup isn't great, but since the broker is not available and we require that connection to cancel the only thing we can do is wait. This bug is easily worked around by calling restarting pulp_workers after the message bus is online.

Also note that if one worker experiences this issue, other workers can start successfully and allow Satellite to function.

After investigating this in upstream Pulp, we have filed a Celery bug [0]. I want to hear back from them regarding how to properly fix this bug.

[0]: https://github.com/celery/celery/issues/4104

Comment 27 pulp-infra@redhat.com 2017-07-12 14:33:21 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 28 pulp-infra@redhat.com 2017-07-18 20:46:18 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 29 pulp-infra@redhat.com 2017-08-23 18:31:47 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 31 Bryan Kearney 2018-09-04 18:04:21 UTC
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in the product in the foreseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.

Comment 32 pulp-infra@redhat.com 2020-08-31 15:06:14 UTC
The Pulp upstream bug status is at CLOSED - WONTFIX. Updating the external tracker on this bug.