Bug 490506 - Joining cluster that has recovered durable messages fails
Summary: Joining cluster that has recovered durable messages fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.1.1
: ---
Assignee: messaging-bugs
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-16 18:34 UTC by Gordon Sim
Modified: 2015-11-16 00:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-21 16:17:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0434 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.1.1 2009-04-21 16:15:50 UTC

Description Gordon Sim 2009-03-16 18:34:30 UTC
Description of problem:

If a cluster is started recovering queue and messages from disk, then a new node tries to join that cluster, that new node fails to connect.

Version-Release number of selected component (if applicable):

qpidd-cluster-0.5.752581-1.el5
rhm-0.5.3153-1.el5

How reproducible:

100%

Steps to Reproduce:
1. Start two node cluster
2. Create a durable queue
3. Send a persistent messages to that queue
4. Stop the cluster
5. Restart one node using the durable store containing this queue and message data
6. Try and start another node (with an empty store as required)
  
Actual results:
2009-mar-16 14:28:40 notice Journal "TplStore": Created
2009-mar-16 14:28:40 notice Store module initialized; dir=test-data-1
2009-mar-16 14:28:40 notice Recovering from cluster, no recovery from local journal
2009-mar-16 14:28:40 notice SASL disabled: No Authentication Performed
2009-mar-16 14:28:40 notice Listening on TCP port 5674
2009-mar-16 14:28:40 notice 20.0.10.15:20727(INIT) joining cluster grs with url=amqp:tcp:10.16.44.222:5674,tcp:20.0.10.15:5674,tcp:192.168.122.1:5674
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
2009-mar-16 14:28:40 notice Broker running
2009-mar-16 14:28:40 notice Journal "durable-test-queue": Created
2009-mar-16 14:28:41 error Connection exception: framing-error: Unexpected command start frame. (qpid/SessionState.cpp:57)
2009-mar-16 14:28:41 error Connection 10.16.44.222:60749 closed by error: Unexpected command start frame. (qpid/SessionState.cpp:57)(501)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[Bbe; channel=1; {MessageTransferBody: destination=qpid.cluster-update; accept-mode=1; acquire-mode=0; }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[be; channel=1; header (61 bytes); properties={{MessageProperties: content-length=3; application-headers={sn:F4:int32(2)}; }{DeliveryProperties: delivery-mode=2; exchange=; routing-key=durable-test-queue; }}]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; content (3 bytes) eos...]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; {ExchangeUnbindBody: queue=durable-test-queue; exchange=qpid.cluster-update; binding-key=; }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; {QueueDeclareBody: queue=qpid.cluster-update; alternate-exchange=; auto-delete=1; arguments={}; }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 error Channel exception: not-attached: receiving Frame[BEbe; channel=1; {ExecutionSyncBody: }]: channel 1 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-mar-16 14:28:41 critical 20.0.10.15:20727(UPDATEE) catch-up connection closed prematurely 20.0.10.15:20727-1(local,catchup)
2009-mar-16 14:28:41 notice 20.0.10.15:20727(LEFT) leaving cluster grs
2009-mar-16 14:28:41 notice Shut down


Expected results:

Second node shoul join the cluster as expected.

Comment 1 Gordon Sim 2009-03-17 16:40:03 UTC
Fixed on trunk by r755316.

Comment 3 Frantisek Reznicek 2009-04-01 09:01:11 UTC
The issue has been fixed, validated on RHEL 5.2 / 5.3 i386 / x86_64 on packages:
[root@intel-d3x1311-01 bz490506]# rpm -qa | egrep '(qpid|rhm)' | sort -u
python-qpid-0.5.752581-1.el5
qpidc-0.5.752581-3.el5
qpidc-devel-0.5.752581-3.el5
qpidc-perftest-0.5.752581-3.el5
qpidc-rdma-0.5.752581-3.el5
qpidc-ssl-0.5.752581-3.el5
qpidd-0.5.752581-3.el5
qpidd-acl-0.5.752581-3.el5
qpidd-cluster-0.5.752581-3.el5
qpidd-devel-0.5.752581-3.el5
qpidd-rdma-0.5.752581-3.el5
qpidd-ssl-0.5.752581-3.el5
qpidd-xml-0.5.752581-3.el5
qpid-java-client-0.5.751061-1.el5
qpid-java-common-0.5.751061-1.el5
rhm-0.5.3206-1.el5
rhm-docs-0.5.756148-1.el5


->VERIFIED

Comment 5 errata-xmlrpc 2009-04-21 16:17:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html


Note You need to log in before you can comment on or make changes to this bug.