Bug 504977 - No knowledge of sessions on channel 0 is transferred to new cluster nodes
Summary: No knowledge of sessions on channel 0 is transferred to new cluster nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.0
Hardware: All
OS: Linux
high
high
Target Milestone: 1.1.2
: ---
Assignee: messaging-bugs
QA Contact: Jan Sarenik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-10 09:40 UTC by Gordon Sim
Modified: 2009-06-12 17:39 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-12 17:39:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix for this issue. (2.78 KB, patch)
2009-06-10 13:08 UTC, Alan Conway
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1097 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging bug fixing update 2009-06-12 17:38:48 UTC

Description Gordon Sim 2009-06-10 09:40:23 UTC
Description of problem:

Knowledge of client sessions that are attached on channel 0 of their respective connections is not propagated to new nodes that join a cluster. This can result in  errors when further commands are issued over those sessions after a new node has joined.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-14.el5

How reproducible:

100%

Steps to Reproduce:
1. start one cluster node
2. start python pubsub example's topic_subscriber.py
3. start second node for cluster
4. run python pubsub example's topic_publisher.py
  
Actual results:

New node exits with:

2009-jun-10 05:31:35 error Channel exception: not-attached: session.completed: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:232)
2009-jun-10 05:31:35 critical 10.16.44.221:15217(READY/error) Error 312 did not occur on 10.16.44.221:15149
2009-jun-10 05:31:35 error Error delivering frames: Aborted by local failure that did not occur on all replicas


Expected results:

No node crashes.

Additional info:

You need to be quick to start step 3 above after 2 or the subscriber will timeout. Editing the timeout in the example makes that a little easier.

Comment 1 Gordon Sim 2009-06-10 09:47:50 UTC
Same error can be observed using the JMS pubsub example as well.

Comment 2 Gordon Sim 2009-06-10 10:25:06 UTC
Another reproducer: start first node, connect with qpid-tool, add second node, invoke echo command on broker through the qpid-tool session.

Comment 3 Jan Sarenik 2009-06-10 13:03:43 UTC
Reproduced on RHEL5 qpidd-0.5.752581-14.el5
using qpid-tool.

--------------------------------------------------------------------------
qpid: list broker
Objects of type org.apache.qpid.broker:broker
    ID   Created   Destroyed  Index
    ====================================
    102  12:55:43  -          101.5672
qpid: call 102 echo 1 ahoj
qpid: Call Result: echo  0 (OK) {'body': u'ahoj', 'sequence': 1L}
--------------------------------------------------------------------------

The second qpidd cluster then leaves the cluster and shuts down
with these messages:

-------------------------------------------------------------------------------
2009-jun-10 14:59:26 info Passivating links
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY) DLVR: Event[192.168.254.1:2575-1 data 237 bytes]
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY) DLVR 168:  Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }] data 192.168.254.1:2575-1
2009-jun-10 14:59:30 debug Exception constructed: receiving Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }]: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-jun-10 14:59:30 error Channel exception: not-attached: receiving Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }]: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Session error 168 unresolved: 192.168.254.1:2575 192.168.254.254:17492 
2009-jun-10 14:59:30 trace MCAST 192.168.254.254:17492-0: {ClusterErrorCheckBody: type=1; frame-seq=168; }
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.254:17492-0 control 27 bytes]
2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Error 168 outcome agrees with 192.168.254.254:17492
2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Error 168 still unresolved: 192.168.254.1:2575 
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.1:2575-1 control 22 bytes]
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.1:2575-0 control 27 bytes]
2009-jun-10 14:59:30 critical 192.168.254.254:17492(READY/error) Error 168 did not occur on 192.168.254.1:2575
2009-jun-10 14:59:30 debug Exception constructed: Aborted by local failure that did not occur on all replicas
2009-jun-10 14:59:30 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-jun-10 14:59:30 notice 192.168.254.254:17492(LEFT/error) leaving cluster jasan
2009-jun-10 14:59:30 debug 192.168.254.254:17492(LEFT/error) deleted connection: 192.168.254.1:2575-1(shadow)
2009-jun-10 14:59:30 debug Message 0x90326f8 enqueued on mgmt-rhel5.virtual.jsn.2587[0x9075738]
2009-jun-10 14:59:30 debug Shutting down CPG
2009-jun-10 14:59:30 notice Shut down
2009-jun-10 14:59:30 debug Journal "TplStore": Destroyed
-------------------------------------------------------------------------------

Comment 4 Alan Conway 2009-06-10 13:08:35 UTC
Created attachment 347216 [details]
Fix for this issue.

The patch fixes both sides: the broker handles channel 0 correctly and the python client no longer uses channel 0 by default.

Comment 5 Jan Sarenik 2009-06-11 07:13:47 UTC
Reproduced on RHEL5-i386 <=> RHEL5-i386

Verified on qpidc-15 build in these scenarios

  First in cluster | Second
   RHEL5-i386      |  RHEL5-i386
   RHEL5-x86_64    |  RHEL5-i386
   RHEL5-i386      |  RHEL5-x86_64
   RHEL5-x86_64    |  RHEL5-x86_64

Comment 7 errata-xmlrpc 2009-06-12 17:39:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html


Note You need to log in before you can comment on or make changes to this bug.