Bug 490506 - Joining cluster that has recovered durable messages fails
Joining cluster that has recovered durable messages fails
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1
All Linux
urgent Severity high
: 1.1.1
: ---
Assigned To: messaging-bugs
Frantisek Reznicek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-16 14:34 EDT by Gordon Sim
Modified: 2015-11-15 19:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 12:17:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gordon Sim 2009-03-16 14:34:30 EDT
Description of problem:

If a cluster is started recovering queue and messages from disk, then a new node tries to join that cluster, that new node fails to connect.

Version-Release number of selected component (if applicable):

qpidd-cluster-0.5.752581-1.el5
rhm-0.5.3153-1.el5

How reproducible:

100%

Steps to Reproduce:
1. Start two node cluster
2. Create a durable queue
3. Send a persistent messages to that queue
4. Stop the cluster
5. Restart one node using the durable store containing this queue and message data
6. Try and start another node (with an empty store as required)
  
Actual results:
2009-mar-16 14:28:40 notice Journal "TplStore": Created
2009-mar-16 14:28:40 notice Store module initialized; dir=test-data-1
2009-mar-16 14:28:40 notice Recovering from cluster, no recovery from local journal
2009-mar-16 14:28:40 notice SASL disabled: No Authentication Performed
2009-mar-16 14:28:40 notice Listening on TCP port 5674
2009-mar-16 14:28:40 notice 20.0.10.15:20727(INIT) joining cluster grs with url=amqp:tcp:10.16.44.222:5674,tcp:20.0.10.15:5674,tcp:192.168.122.1:5674
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
2009-mar-16 14:28:40 notice Broker running
2009-mar-16 14:28:40 notice Journal "durable-test-queue": Created
2009-mar-16 14:28:41 error Connection exception: framing-error: Unexpected command start frame. (qpid/SessionState.cpp:57)
2009-mar-16 14:28:41 error Connection 10.16.44.222:60749 closed by error: Unexpected command start frame. (qpid/SessionState.cpp:57)(501)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[Bbe; channel=1; {MessageTransferBody: destination=qpid.cluster-update; accept-mode=1; acquire-mode=0; }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[be; channel=1; header (61 bytes); properties={{MessageProperties: content-length=3; application-headers={sn:F4:int32(2)}; }{DeliveryProperties: delivery-mode=2; exchange=; routing-key=durable-test-queue; }}]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; content (3 bytes) eos...]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; {ExchangeUnbindBody: queue=durable-test-queue; exchange=qpid.cluster-update; binding-key=; }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; {QueueDeclareBody: queue=qpid.cluster-update; alternate-exchange=; auto-delete=1; arguments={}; }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; {ExecutionSyncBody: }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 critical 20.0.10.15:20727(UPDATEE) catch-up connection closed prematurely 20.0.10.15:20727-1(local,catchup)
2009-mar-16 14:28:41 notice 20.0.10.15:20727(LEFT) leaving cluster grs
2009-mar-16 14:28:41 notice Shut down


Expected results:

Second node shoul join the cluster as expected.
Comment 1 Gordon Sim 2009-03-17 12:40:03 EDT
Fixed on trunk by r755316.
Comment 3 Frantisek Reznicek 2009-04-01 05:01:11 EDT
The issue has been fixed, validated on RHEL 5.2 / 5.3 i386 / x86_64 on packages:
[root@intel-d3x1311-01 bz490506]# rpm -qa | egrep '(qpid|rhm)' | sort -u
python-qpid-0.5.752581-1.el5
qpidc-0.5.752581-3.el5
qpidc-devel-0.5.752581-3.el5
qpidc-perftest-0.5.752581-3.el5
qpidc-rdma-0.5.752581-3.el5
qpidc-ssl-0.5.752581-3.el5
qpidd-0.5.752581-3.el5
qpidd-acl-0.5.752581-3.el5
qpidd-cluster-0.5.752581-3.el5
qpidd-devel-0.5.752581-3.el5
qpidd-rdma-0.5.752581-3.el5
qpidd-ssl-0.5.752581-3.el5
qpidd-xml-0.5.752581-3.el5
qpid-java-client-0.5.751061-1.el5
qpid-java-common-0.5.751061-1.el5
rhm-0.5.3206-1.el5
rhm-docs-0.5.756148-1.el5


->VERIFIED
Comment 5 errata-xmlrpc 2009-04-21 12:17:14 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html

Note You need to log in before you can comment on or make changes to this bug.