Bug 1096935 - Restarting Qpid Multiple Times Crash qpid.messaging Endpoints
Summary: Restarting Qpid Multiple Times Crash qpid.messaging Endpoints
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: consumers
Version: 2.4 Beta
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
: 2.4.0
Assignee: Brian Bouterse
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: 1096539
TreeView+ depends on / blocked
 
Reported: 2014-05-12 17:55 UTC by Brian Bouterse
Modified: 2014-08-09 06:56 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-08-09 06:56:00 UTC
Embargoed:


Attachments (Terms of Use)

Description Brian Bouterse 2014-05-12 17:55:16 UTC
On a RHEL 6.5 machine with Qpid 0.18, if you start httpd, and restart qpidd several times, you'll receive this traceback:


May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR: 78639a50-33d3-428c-8978-e594a2db4341
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR: Traceback (most recent call last):
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/gofer/transport/qpid/consumer.py", line 113, in get
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:     return self.__receiver.fetch(timeout=timeout)
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:   File "<string>", line 6, in fetch
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1026, in fetch
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:     self._ecwait(lambda: not self.draining)
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:     result = self._ewait(lambda: self.closed or predicate(), timeout)
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 979, in _ewait
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:     self.check_error()
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 968, in check_error
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR:     raise self.error
May 12 14:48:43 pulp-24-server pulp: gofer.transport.qpid.consumer:ERROR: NotFound: no such queue: pulp.task


To reproduce:
1.  Stop all services including qpidd, httpd, pulp_*.  Leave mongo running.
2.  Start httpd using `sudo service httpd start`.  It will boot, and indicate it can't connect to qpidd, and then sleep until qpidd is available.
3.  Start qpidd using `sudo service qpidd start`.  You'll see in the logs that httpd finds it eventually.
4.  Stop qpidd using `sudo service qpidd stop`.  You'll see httpd go back into it's sleep/discovery state with qpidd.
5.  Start qpidd using `sudo service qpidd start`.
6.  Observe traceback.


The output of `qpid-stat -q` after step 3 is as expected:
Queues
  queue                                     dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  =========================================================================================================================
  79d26a27-5c08-47b9-8a87-a077151fbbd9:0.0       Y        Y        0     0      0       0      0        0         1     2
  pulp.task                                 Y                      0     0      0       0      0        0         3     1


The output of `qpid-stat -q` after step 6 is not as expected:
[root@pulp-24-server ~]# qpid-stat -q
Queues
  queue                                     dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  =========================================================================================================================
  092cc825-973b-4400-9dae-d0f11c10ee38:0.0       Y        Y        0     0      0       0      0        0         1     2



I tried to reproduce this on my FC20 machine with qpidd 0.24, and qpid tools at 0.26, and I was unable to reproduce the behavior.

Comment 1 Brian Bouterse 2014-05-17 17:24:33 UTC
The root cause of this bug is that on qpid-cpp-server 0.18, to have a "durable" queue, you need to also install the 'qpid-cpp-server-store' rpm.  This behavior was changed in later versions of qpid, and has been verified to not be needed with qpid 0.24.

On qpid 0.18 without the qpid-cpp-server-store package, a queue that is marked as durable, does not survive a restart.  Gofer also is set to automatically declare the pulp.task queue, however I believe the queue auto-creation does not occur if qpid.messaging reconnects.  Thus if qpidd is bounced, when it comes back gofer cannot find the pulp.task queue!  I do not understand why this occurs on the second bounce of qpidd, and not hte first.

The issue was reproduced on a machine. Then the qpid-cpp-server-store package was installed, and the issue could not be reproduced.

The resolution is to update the docs with a note about this behavior, and to add a FAQ entry relating that message to this issue.  No dependencies will be adjusted to include the 'qpid-cpp-server-store' rpm because it isn't needed for any recent version of qpid, and pulp does not require any qpid components in its dependencies because it is broker agnostic.

Comment 2 Brian Bouterse 2014-05-17 18:27:58 UTC
PR available at:

https://github.com/pulp/pulp/pull/974

Comment 3 Randy Barlow 2014-05-20 23:45:39 UTC
Fixed in 2.4.0-0.17.beta.

Comment 4 Preethi Thomas 2014-05-29 17:40:46 UTC
verified
I was able to reproduce the error with the steps above and installing the qpid-cpp-server-store fixed the issue for me.


[root@ibm-x3250m4-04 ~]# rpm -qa pulp-server
pulp-server-2.4.0-0.18.beta.el6.noarch
[root@ibm-x3250m4-04 ~]# 
[root@ibm-x3250m4-04 ~]# 
[root@ibm-x3250m4-04 ~]# 
[root@ibm-x3250m4-04 ~]# rpm -qa |grep qpid
qpid-cpp-client-0.18-17.el6_4.x86_64
python-qpid-0.18-5.el6_4.noarch
qpid-cpp-server-0.18-17.el6_4.x86_64
qpid-qmf-0.18-18.el6_4.x86_64
python-qpid-qmf-0.18-18.el6_4.x86_64
python-gofer-qpid-1.0.13-1.el6.noarch
qpid-cpp-server-store-0.18-17.el6_4.x86_64
qpid-tools-0.18-10.el6_4.noarch
[root@ibm-x3250m4-04 ~]#

Comment 5 Preethi Thomas 2014-05-30 15:41:11 UTC
Moving it back to on_qa since I have seen conversation on this behavior. Will have to check with different versions of qpid.

Comment 6 Preethi Thomas 2014-07-01 19:13:12 UTC
verified again that this issue has been resolved

Comment 7 Randy Barlow 2014-08-09 06:56:00 UTC
This has been fixed in Pulp 2.4.0-1.


Note You need to log in before you can comment on or make changes to this bug.