Pulp uses a few different network dependencies, such as MongoDB and a message broker. If those network dependencies are unavailable either when Pulp starts, or become unavailable after Pulp starts, Pulp does not automatically reconnect. Currently, Pulp must be restarted to reconnect to these services. This will be a problem for users who wish to deploy these services to a separate host from the Pulp server, as we cannot rely on the init system to handle the network dependencies. Also, it is important that Pulp reliably respond to outages in these services.
I wanted to clarify that the specific service I've noticed us failing to reconnect to is qpidd. I do want us to verify as part of this ticket that we can handle mongod outages too, but I have not personally experienced issues with Mongo reconnections.
I've been testing a small adjustment[0] to the qpid transport for kombu that adds reconnect capability, and aligns the initial connection timeout with the celery set connection timeout. I want to write tests for it, and then PR it onto the deps branch 'pulp-dep-3.0.15-with-qpid'. With that code, any celery process instructs qpid.messaging to wait up to 4 seconds (the celery default) as qpid.messaging attempts to connect. If it cannot connect after 4 seconds you'll get a traceback like this one: ====================START TRACEBACK ================================ May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: Unrecoverable error: Timeout('Connection attach timed out',) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: Traceback (most recent call last): May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: self.blueprint.start(self) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: step.start(parent) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 373, in start May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: return self.obj.start() May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 278, in start May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: blueprint.start(self) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: step.start(parent) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 478, in start May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: c.connection = c.connect() May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/celery/worker/consumer.py", line 375, in connect May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: callback=maybe_shutdown, May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/home/bmbouter/kombu/kombu/connection.py", line 373, in ensure_connection May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: interval_start, interval_step, interval_max, callback) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/home/bmbouter/kombu/kombu/utils/__init__.py", line 243, in retry_over_time May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: return fun(*args, **kwargs) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/home/bmbouter/kombu/kombu/connection.py", line 241, in connect May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: return self.connection May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/home/bmbouter/kombu/kombu/connection.py", line 758, in connection May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: self._connection = self._establish_connection() May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/home/bmbouter/kombu/kombu/connection.py", line 713, in _establish_connection May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: conn = self.transport.establish_connection() May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1676, in establish_connection May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: conn = self.Connection(**opts) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1514, in __init__ May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: self._qpid_conn = qpid.messaging.Connection.establish(**self.connection_options) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 68, in establish May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: conn.open(timeout=timeout) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "<string>", line 6, in open May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 273, in open May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: self.attach(timeout=timeout) May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "<string>", line 6, in attach May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 293, in attach May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: raise Timeout("Connection attach timed out") May 17 16:23:01 dhcp129-138.rdu.redhat.com pulp[29879]: celery.worker:ERROR: Timeout: Connection attach timed out =============================================== I think that is the correct behavior because Pulp should not start if it can't work. If you haven't started the message bus, then celery shouldn't start so that you know to go start the message bus. This is also consistent with rabbitmq I believe. The good news is that once started, you can stop qpidd for any amount of time, and qpid.messaging will wait gracefully in the same way that gofer waits. Once Qpid returns, the qpid.messaging connection will pick back up. There is a problem , which is celery behaviorally sets queues to be exclusive so that when the celery client disconnects the queue is deleted. When Qpid shuts down, those queues are deleted, and when it starts they are not recreated. The qpid.messaging client expects them to be there, tries to reattach, and the queue cannot be found. That shows itself as a traceback in celerybeat: ============= START TRACEBACK ======================= May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: Exception in thread Thread-8: May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: Traceback (most recent call last): May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: self.run() May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1362, in run May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: timeout=FDShimThread.block_timeout) May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "<string>", line 6, in fetch May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1046, in fetch May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: self._ecwait(lambda: not self.draining) May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: result = self._ewait(lambda: self.closed or predicate(), timeout) May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 999, in _ewait May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: self.check_error() May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 988, in check_error May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: raise self.error May 17 15:54:59 dhcp129-138.rdu.redhat.com celery[27720]: NotFound: no such queue: celeryev.b33e6f0d-8078-4ac4-9d9c-deee4c3eb136 =============================================================== Or in a celery worker as: ==================== START TRACEBACK ============================= May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: Exception in thread Thread-13: May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: Traceback (most recent call last): May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: self.run() May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/home/bmbouter/kombu/kombu/transport/qpid.py", line 1362, in run May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: timeout=FDShimThread.block_timeout) May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "<string>", line 6, in fetch May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1046, in fetch May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: self._ecwait(lambda: not self.draining) May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: result = self._ewait(lambda: self.closed or predicate(), timeout) May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 999, in _ewait May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: self.check_error() May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 988, in check_error May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: raise self.error May 17 16:21:39 dhcp129-138.rdu.redhat.com celery[29735]: NotFound: no such queue: resource_manager.redhat.com.celery.pidbox ================================================================== If we can recover from these queues getting deleted unexpectedly, I believe qpid.messaging would recover for any number or length of qpidd outages. [0]: https://github.com/pulp/kombu/commit/3af57873414ec2375448988822b13c6bdb0986f9
After the Transport rewrite, the reconnect seems to behaving perfectly. PR available at: https://github.com/pulp/kombu/pull/7
This needed to be implemented at the Celery level, and not the qpid.messaging level, so the real PR is available here: https://github.com/pulp/kombu/pull/8
This has been merged to pulp/kombu on branch: pulp-dep-3.0.15-with-qpid This should not be marked MODIFIED, until it has been committed to pulp/pulp in the deps folder with a new patch added. That should be done just before the next BETA release to avoid doing it multiple times.
This has been merged to pulp/kombu on branch: pulp-dep-3.0.15-with-qpid This fix is included in the patch that is part of the python-kombu testing candidate python-kombu-3.0.15-9.pulp which is built on the branch bmbouter-python-kombu-testing-6-6-2014[0]. python-kombu-3.0.15-9.pulp will be available for testing in the the Pulp testing repo. This should not be marked MODIFIED, until bmbouter-python-kombu-testing-6-6-2014 has been merged onto master with PR review, and the tags have been built and pushed to github. That should be done just before the next BETA release to avoid doing it multiple times. [0]: https://github.com/pulp/pulp/tree/bmbouter-python-kombu-testing-6-6-2014
One correction. The testing candidate containing this will be python-kombu-3.0.15-10.pulp
Merged onto pulp-2.4 and master
build: 2.4.0-0.20.beta
Fails-qa [root@mgmt3 ~]# rpm -q python-kombu python-kombu-3.0.15-10.pulp.el6.noarch [root@mgmt3 ~]# rpm -q pulp-server pulp-server-2.4.0-0.23.beta.el6.noarch [root@mgmt3 ~]# I stopped the qpidd and then restarted qpidd but the pulp services still doesnt seem to have reconnected Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:52 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: d4e655fe-1fff-4f1d-96e9-e79906da46be Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last): Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: return self.__receiver.fetch(timeout=timeout) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "<string>", line 6, in fetch Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self._ecwait(lambda: self.linked) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: result = self._ewait(lambda: self.closed or predicate(), timeout) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self.check_error() Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: raise self.error Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: eea89a4a-2198-48e0-b7b9-1261e77a8037 Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last): Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: return self.__receiver.fetch(timeout=timeout) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "<string>", line 6, in fetch Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self._ecwait(lambda: self.linked) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: result = self._ewait(lambda: self.closed or predicate(), timeout) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self.check_error() Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: raise self.error Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 2459cc3e-b7be-409f-9941-39dbba95df00 Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last): Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: return self.__receiver.fetch(timeout=timeout) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "<string>", line 6, in fetch Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self._ecwait(lambda: self.linked) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: result = self._ewait(lambda: self.closed or predicate(), timeout) Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self.check_error() Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: raise self.error Jul 1 06:18:53 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task
I believe that the QE environment that was used to verify this bug is missing the either -store or -linearstore package. See the note in the docs here[0] about what to install. I am putting the status back to ON_QA so it can be retested. [0]: step 2 under server section here. http://pulp-user-guide.readthedocs.org/en/latest/installation.html
I installed the qpid-cpp-server-store package [root@mgmt3 ~]# rpm -qa qpid-cpp-server qpid-cpp-server-0.18-17.el6_4.x86_64 [root@mgmt3 ~]# After that I restarted qpid But the pulp-* services doesnt seem to have reconnected Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:15 mgmt3 pulp: kombu.transport.qpid:ERROR: connection aborted Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 6ab78897-f2c5-4baf-ac2c-53ba145b32ed Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last): Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: return self.__receiver.fetch(timeout=timeout) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "<string>", line 6, in fetch Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self._ecwait(lambda: self.linked) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: result = self._ewait(lambda: self.closed or predicate(), timeout) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self.check_error() Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: raise self.error Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 427919a3-2b9f-40fd-a47e-112daa81b90b Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last): Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: return self.__receiver.fetch(timeout=timeout) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "<string>", line 6, in fetch Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self._ecwait(lambda: self.linked) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: result = self._ewait(lambda: self.closed or predicate(), timeout) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self.check_error() Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: raise self.error Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: 8371a53f-2e9f-4581-941b-3d50db2fcd3f Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last): Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: return self.__receiver.fetch(timeout=timeout) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "<string>", line 6, in fetch Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1015, in fetch Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self._ecwait(lambda: self.linked) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: result = self._ewait(lambda: self.closed or predicate(), timeout) Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: self.check_error() Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: raise self.error Jul 3 10:33:22 mgmt3 pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task
Failing
After investigating the machine where the QA failed, I've determined two things: 1. The NotFound: no such queue: pulp.task Exception was environmental due to the -store package not being present. 2. The reconnect support is not working. I've also reproduced it in a FC20 environment. I'm looking into the root cause now.
PR available at: https://github.com/pulp/kombu/pull/13
Merged to pulp-dep-3.0.15-with-qpid and then merged pulp-dep-3.0.15-with-qpid to 'qpid-transport' Waiting until the patch including this code is added to pulp/pulp before moving to MODIFIED
The patch containing this fix has been merged to pulp/pulp branch 2.4, and has been tagged into python-kombu-3.0.15-11. Moving to MODIFIED.
python-kombu-3.0.15-11 is included in 2.4.0-0.24.beta
verified stopped qpidd and restarted it. services all reconnected. [root@mgmt3 ~]# rpm -q python-kombu python-kombu-3.0.15-11.pulp.el6.noarch [root@mgmt3 ~]# rpm -q pulp-server pulp-server-2.4.0-0.24.beta.el6.noarch
This has been fixed in Pulp 2.4.0-1.