Bug 1096539 - Pulp does not handle interruptions to qpidd
Summary: Pulp does not handle interruptions to qpidd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: z_other
Version: Master
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: 2.4.0
Assignee: Brian Bouterse
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On: 1096935
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-11 21:12 UTC by Randy Barlow
Modified: 2014-08-09 06:55 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-09 06:55:58 UTC
Embargoed:


Attachments (Terms of Use)

Description Randy Barlow 2014-05-11 21:12:29 UTC
Pulp uses a few different network dependencies, such as MongoDB and a message broker. If those network dependencies are unavailable either when Pulp starts, or become unavailable after Pulp starts, Pulp does not automatically reconnect. Currently, Pulp must be restarted to reconnect to these services.

This will be a problem for users who wish to deploy these services to a separate host from the Pulp server, as we cannot rely on the init system to handle the network dependencies. Also, it is important that Pulp reliably respond to outages in these services.

Comment 1 Randy Barlow 2014-05-13 13:43:02 UTC
I wanted to clarify that the specific service I've noticed us failing to reconnect to is qpidd. I do want us to verify as part of this ticket that we can handle mongod outages too, but I have not personally experienced issues with Mongo reconnections.

Comment 2 Brian Bouterse 2014-05-17 20:28:06 UTC
I've been testing a small adjustment[0] to the qpid transport for kombu that adds reconnect capability, and aligns the initial connection timeout with the celery set connection timeout.

I want to write tests for it, and then PR it onto the deps branch 'pulp-dep-3.0.15-with-qpid'.

With that code, any celery process instructs qpid.messaging to wait up to 4 seconds (the celery default) as qpid.messaging attempts to connect.  If it cannot connect after 4 seconds you'll get a traceback like this one:

====================START TRACEBACK ================================

May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: Unrecoverable error: Timeout('Connection attach timed out',)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: Traceback (most recent call last):
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     self.blueprint.start(self)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     step.start(parent)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 373, in start
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     return self.obj.start()
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 278, in start
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     blueprint.start(self)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     step.start(parent)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 478, in start
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     c.connection = c.connect()
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 375, in connect
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     callback=maybe_shutdown,
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/home/bmbouter/kombu/kombu/connection.py", line 373, in ensure_connection
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     interval_start, interval_step, interval_max, callback)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/home/bmbouter/kombu/kombu/utils/__init__.py", line 243, in retry_over_time
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     return fun(*args, **kwargs)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/home/bmbouter/kombu/kombu/connection.py", line 241, in connect
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     return self.connection
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/home/bmbouter/kombu/kombu/connection.py", line 758, in connection
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     self._connection = self._establish_connection()
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/home/bmbouter/kombu/kombu/connection.py", line 713, in _establish_connection
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     conn = self.transport.establish_connection()
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1676, in establish_connection
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     conn = self.Connection(**opts)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1514, in __init__
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     self._qpid_conn = qpid.messaging.Connection.establish(**self.connection_options)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 68, in establish
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     conn.open(timeout=timeout)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "<string>", line 6, in open
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 273, in open
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     self.attach(timeout=timeout)
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "<string>", line 6, in attach
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 293, in attach
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR:     raise Timeout("Connection attach timed out")
May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: Timeout: Connection attach timed out

===============================================


I think that is the correct behavior because Pulp should not start if it can't work.  If you haven't started the message bus, then celery shouldn't start so that you know to go start the message bus.  This is also consistent with rabbitmq I believe.

The good news is that once started, you can stop qpidd for any amount of time, and qpid.messaging will wait gracefully in the same way that gofer waits.  Once Qpid returns, the qpid.messaging connection will pick back up.

There is a problem , which is celery behaviorally sets queues to be exclusive so that when the celery client disconnects the queue is deleted. When Qpid shuts down, those queues are deleted, and when it starts they are not recreated. The qpid.messaging client expects them to be there, tries to reattach, and the queue cannot be found.  That shows itself as a traceback in celerybeat:


============= START TRACEBACK =======================

May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: Exception in thread Thread-8:
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: Traceback (most recent call last):
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: self.run()
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1362, in run
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: timeout=FDShimThread.block_timeout)
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "<string>", line 6, in fetch
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1046, in fetch
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: self._ecwait(lambda: not self.draining)
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: result = self._ewait(lambda: self.closed or predicate(), timeout)
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 999, in _ewait
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: self.check_error()
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 988, in check_error
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: raise self.error
May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: NotFound: no such queue: celeryev.b33e6f0d-8078-4ac4-9d9c-deee4c3eb136

===============================================================


Or in a celery worker as:

==================== START TRACEBACK =============================

May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: Exception in thread Thread-13:
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: Traceback (most recent call last):
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: self.run()
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1362, in run
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: timeout=FDShimThread.block_timeout)
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "<string>", line 6, in fetch
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1046, in fetch
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: self._ecwait(lambda: not self.draining)
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: result = self._ewait(lambda: self.closed or predicate(), timeout)
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 999, in _ewait
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: self.check_error()
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 988, in check_error
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: raise self.error
May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: NotFound: no such queue: resource_manager.redhat.com.celery.pidbox

==================================================================

If we can recover from these queues getting deleted unexpectedly, I believe qpid.messaging would recover for any number or length of qpidd outages.

[0]:  https://github.com/pulp/kombu/commit/3af57873414ec2375448988822b13c6bdb0986f9

Comment 3 Brian Bouterse 2014-05-28 21:13:53 UTC
After the Transport rewrite, the reconnect seems to behaving perfectly.

PR available at: https://github.com/pulp/kombu/pull/7

Comment 4 Brian Bouterse 2014-05-30 20:55:07 UTC
This needed to be implemented at the Celery level, and not the qpid.messaging level, so the real PR is available here:

https://github.com/pulp/kombu/pull/8

Comment 5 Brian Bouterse 2014-06-02 18:53:02 UTC
This has been merged to pulp/kombu on branch:  pulp-dep-3.0.15-with-qpid

This should not be marked MODIFIED, until it has been committed to pulp/pulp in the deps folder with a new patch added. That should be done just before the next BETA release to avoid doing it multiple times.

Comment 6 Brian Bouterse 2014-06-06 15:48:19 UTC
This has been merged to pulp/kombu on branch:  pulp-dep-3.0.15-with-qpid

This fix is included in the patch that is part of the python-kombu testing candidate python-kombu-3.0.15-9.pulp which is built on the branch bmbouter-python-kombu-testing-6-6-2014[0].

python-kombu-3.0.15-9.pulp will be available for testing in the the Pulp testing repo.

This should not be marked MODIFIED, until bmbouter-python-kombu-testing-6-6-2014 has been merged onto master with PR review, and the tags have been built and pushed to github. That should be done just before the next BETA release to avoid doing it multiple times.

[0]:  https://github.com/pulp/pulp/tree/bmbouter-python-kombu-testing-6-6-2014

Comment 7 Brian Bouterse 2014-06-06 19:01:10 UTC
One correction. The testing candidate containing this will be python-kombu-3.0.15-10.pulp

Comment 8 Brian Bouterse 2014-06-10 19:32:56 UTC
Merged onto pulp-2.4 and master

Comment 9 Jeff Ortel 2014-06-11 01:33:04 UTC
build: 2.4.0-0.20.beta

Comment 10 Preethi Thomas 2014-07-01 10:24:56 UTC
Fails-qa
[root@mgmt3 ~]# rpm -q python-kombu
python-kombu-3.0.15-10.pulp.el6.noarch
[root@mgmt3 ~]# rpm -q pulp-server
pulp-server-2.4.0-0.23.beta.el6.noarch
[root@mgmt3 ~]# 

I stopped the qpidd and then restarted qpidd but the pulp services still doesnt seem to have reconnected

Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: d4e655fe-1fff-4f1d-96e9-e79906da46be
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: eea89a4a-2198-48e0-b7b9-1261e77a8037
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 2459cc3e-b7be-409f-9941-39dbba95df00
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
Jul  1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task

Comment 11 Brian Bouterse 2014-07-03 13:52:34 UTC
I believe that the QE environment that was used to verify this bug is missing the either -store or -linearstore package. See the note in the docs here[0] about what to install. I am putting the status back to ON_QA so it can be retested.

[0]:  step 2 under server section here.  http://pulp-user-guide.readthedocs.org/en/latest/installation.html

Comment 12 Preethi Thomas 2014-07-03 14:38:09 UTC
I installed the qpid-cpp-server-store package

[root@mgmt3 ~]# rpm -qa qpid-cpp-server
qpid-cpp-server-0.18-17.el6_4.x86_64
[root@mgmt3 ~]# 
 After that I restarted qpid

But the pulp-*  services doesnt seem to have reconnected

Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 6ab78897-f2c5-4baf-ac2c-53ba145b32ed
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 427919a3-2b9f-40fd-a47e-112daa81b90b
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 8371a53f-2e9f-4581-941b-3d50db2fcd3f
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: self.linked)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
Jul  3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task

Comment 13 Preethi Thomas 2014-07-03 14:39:20 UTC
Failing

Comment 14 Brian Bouterse 2014-07-03 19:33:32 UTC
After investigating the machine where the QA failed, I've determined two things:

1. The NotFound: no such queue: pulp.task Exception was environmental due to the -store package not being present.

2. The reconnect support is not working. I've also reproduced it in a FC20 environment.

I'm looking into the root cause now.

Comment 15 Brian Bouterse 2014-07-03 23:22:18 UTC
PR available at:

https://github.com/pulp/kombu/pull/13

Comment 16 Brian Bouterse 2014-07-07 17:47:00 UTC
Merged to pulp-dep-3.0.15-with-qpid and then merged pulp-dep-3.0.15-with-qpid to 'qpid-transport'  Waiting until the patch including this code is added to pulp/pulp before moving to MODIFIED

Comment 17 Brian Bouterse 2014-07-10 16:15:26 UTC
The patch containing this fix has been merged to pulp/pulp branch 2.4, and has been tagged into python-kombu-3.0.15-11. Moving to MODIFIED.

Comment 18 Brian Bouterse 2014-07-11 20:30:26 UTC
python-kombu-3.0.15-11 is included in 2.4.0-0.24.beta

Comment 19 Preethi Thomas 2014-07-14 17:38:03 UTC
verified
stopped qpidd and restarted it. services all reconnected.
[root@mgmt3 ~]#  rpm -q python-kombu
python-kombu-3.0.15-11.pulp.el6.noarch
[root@mgmt3 ~]# rpm -q pulp-server
pulp-server-2.4.0-0.24.beta.el6.noarch

Comment 20 Randy Barlow 2014-08-09 06:55:58 UTC
This has been fixed in Pulp 2.4.0-1.


Note You need to log in before you can comment on or make changes to this bug.