Description of problem: Knowledge of client sessions that are attached on channel 0 of their respective connections is not propagated to new nodes that join a cluster. This can result in errors when further commands are issued over those sessions after a new node has joined. Version-Release number of selected component (if applicable): qpidd-0.5.752581-14.el5 How reproducible: 100% Steps to Reproduce: 1. start one cluster node 2. start python pubsub example's topic_subscriber.py 3. start second node for cluster 4. run python pubsub example's topic_publisher.py Actual results: New node exits with: 2009-jun-10 05:31:35 error Channel exception: not-attached: session.completed: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:232) 2009-jun-10 05:31:35 critical 10.16.44.221:15217(READY/error) Error 312 did not occur on 10.16.44.221:15149 2009-jun-10 05:31:35 error Error delivering frames: Aborted by local failure that did not occur on all replicas Expected results: No node crashes. Additional info: You need to be quick to start step 3 above after 2 or the subscriber will timeout. Editing the timeout in the example makes that a little easier.
Same error can be observed using the JMS pubsub example as well.
Another reproducer: start first node, connect with qpid-tool, add second node, invoke echo command on broker through the qpid-tool session.
Reproduced on RHEL5 qpidd-0.5.752581-14.el5 using qpid-tool. -------------------------------------------------------------------------- qpid: list broker Objects of type org.apache.qpid.broker:broker ID Created Destroyed Index ==================================== 102 12:55:43 - 101.5672 qpid: call 102 echo 1 ahoj qpid: Call Result: echo 0 (OK) {'body': u'ahoj', 'sequence': 1L} -------------------------------------------------------------------------- The second qpidd cluster then leaves the cluster and shuts down with these messages: ------------------------------------------------------------------------------- 2009-jun-10 14:59:26 info Passivating links 2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY) DLVR: Event[192.168.254.1:2575-1 data 237 bytes] 2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY) DLVR 168: Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }] data 192.168.254.1:2575-1 2009-jun-10 14:59:30 debug Exception constructed: receiving Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }]: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79) 2009-jun-10 14:59:30 error Channel exception: not-attached: receiving Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }]: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79) 2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Session error 168 unresolved: 192.168.254.1:2575 192.168.254.254:17492 2009-jun-10 14:59:30 trace MCAST 192.168.254.254:17492-0: {ClusterErrorCheckBody: type=1; frame-seq=168; } 2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.254:17492-0 control 27 bytes] 2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Error 168 outcome agrees with 192.168.254.254:17492 2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Error 168 still unresolved: 192.168.254.1:2575 2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.1:2575-1 control 22 bytes] 2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.1:2575-0 control 27 bytes] 2009-jun-10 14:59:30 critical 192.168.254.254:17492(READY/error) Error 168 did not occur on 192.168.254.1:2575 2009-jun-10 14:59:30 debug Exception constructed: Aborted by local failure that did not occur on all replicas 2009-jun-10 14:59:30 error Error delivering frames: Aborted by local failure that did not occur on all replicas 2009-jun-10 14:59:30 notice 192.168.254.254:17492(LEFT/error) leaving cluster jasan 2009-jun-10 14:59:30 debug 192.168.254.254:17492(LEFT/error) deleted connection: 192.168.254.1:2575-1(shadow) 2009-jun-10 14:59:30 debug Message 0x90326f8 enqueued on mgmt-rhel5.virtual.jsn.2587[0x9075738] 2009-jun-10 14:59:30 debug Shutting down CPG 2009-jun-10 14:59:30 notice Shut down 2009-jun-10 14:59:30 debug Journal "TplStore": Destroyed -------------------------------------------------------------------------------
Created attachment 347216 [details] Fix for this issue. The patch fixes both sides: the broker handles channel 0 correctly and the python client no longer uses channel 0 by default.
Reproduced on RHEL5-i386 <=> RHEL5-i386 Verified on qpidc-15 build in these scenarios First in cluster | Second RHEL5-i386 | RHEL5-i386 RHEL5-x86_64 | RHEL5-i386 RHEL5-i386 | RHEL5-x86_64 RHEL5-x86_64 | RHEL5-x86_64
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1097.html