504977 – No knowledge of sessions on channel 0 is transferred to new cluster nodes

Bug 504977 - No knowledge of sessions on channel 0 is transferred to new cluster nodes

Summary: No knowledge of sessions on channel 0 is transferred to new cluster nodes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	1.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	1.1.2
Target Release:	---
Assignee:	messaging-bugs
QA Contact:	Jan Sarenik
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-06-10 09:40 UTC by Gordon Sim
Modified:	2009-06-12 17:39 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-06-12 17:39:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Fix for this issue. (2.78 KB, patch) 2009-06-10 13:08 UTC, Alan Conway	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:1097	0	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Messaging bug fixing update	2009-06-12 17:38:48 UTC

Description Gordon Sim 2009-06-10 09:40:23 UTC

Description of problem:

Knowledge of client sessions that are attached on channel 0 of their respective connections is not propagated to new nodes that join a cluster. This can result in  errors when further commands are issued over those sessions after a new node has joined.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-14.el5

How reproducible:

100%

Steps to Reproduce:
1. start one cluster node
2. start python pubsub example's topic_subscriber.py
3. start second node for cluster
4. run python pubsub example's topic_publisher.py
  
Actual results:

New node exits with:

2009-jun-10 05:31:35 error Channel exception: not-attached: session.completed: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:232)
2009-jun-10 05:31:35 critical 10.16.44.221:15217(READY/error) Error 312 did not occur on 10.16.44.221:15149
2009-jun-10 05:31:35 error Error delivering frames: Aborted by local failure that did not occur on all replicas


Expected results:

No node crashes.

Additional info:

You need to be quick to start step 3 above after 2 or the subscriber will timeout. Editing the timeout in the example makes that a little easier.

Comment 1 Gordon Sim 2009-06-10 09:47:50 UTC

Same error can be observed using the JMS pubsub example as well.

Comment 2 Gordon Sim 2009-06-10 10:25:06 UTC

Another reproducer: start first node, connect with qpid-tool, add second node, invoke echo command on broker through the qpid-tool session.

Comment 3 Jan Sarenik 2009-06-10 13:03:43 UTC

Reproduced on RHEL5 qpidd-0.5.752581-14.el5
using qpid-tool.

--------------------------------------------------------------------------
qpid: list broker
Objects of type org.apache.qpid.broker:broker
    ID   Created   Destroyed  Index
    ====================================
    102  12:55:43  -          101.5672
qpid: call 102 echo 1 ahoj
qpid: Call Result: echo  0 (OK) {'body': u'ahoj', 'sequence': 1L}
--------------------------------------------------------------------------

The second qpidd cluster then leaves the cluster and shuts down
with these messages:

-------------------------------------------------------------------------------
2009-jun-10 14:59:26 info Passivating links
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY) DLVR: Event[192.168.254.1:2575-1 data 237 bytes]
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY) DLVR 168:  Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }] data 192.168.254.1:2575-1
2009-jun-10 14:59:30 debug Exception constructed: receiving Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }]: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-jun-10 14:59:30 error Channel exception: not-attached: receiving Frame[Bbe; channel=0; {MessageTransferBody: destination=qpid.management; }]: channel 0 is not attached (qpid/amqp_0_10/SessionHandler.cpp:79)
2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Session error 168 unresolved: 192.168.254.1:2575 192.168.254.254:17492 
2009-jun-10 14:59:30 trace MCAST 192.168.254.254:17492-0: {ClusterErrorCheckBody: type=1; frame-seq=168; }
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.254:17492-0 control 27 bytes]
2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Error 168 outcome agrees with 192.168.254.254:17492
2009-jun-10 14:59:30 debug 192.168.254.254:17492(READY/error) Error 168 still unresolved: 192.168.254.1:2575 
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.1:2575-1 control 22 bytes]
2009-jun-10 14:59:30 trace 192.168.254.254:17492(READY/error) DLVR: Event[192.168.254.1:2575-0 control 27 bytes]
2009-jun-10 14:59:30 critical 192.168.254.254:17492(READY/error) Error 168 did not occur on 192.168.254.1:2575
2009-jun-10 14:59:30 debug Exception constructed: Aborted by local failure that did not occur on all replicas
2009-jun-10 14:59:30 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-jun-10 14:59:30 notice 192.168.254.254:17492(LEFT/error) leaving cluster jasan
2009-jun-10 14:59:30 debug 192.168.254.254:17492(LEFT/error) deleted connection: 192.168.254.1:2575-1(shadow)
2009-jun-10 14:59:30 debug Message 0x90326f8 enqueued on mgmt-rhel5.virtual.jsn.2587[0x9075738]
2009-jun-10 14:59:30 debug Shutting down CPG
2009-jun-10 14:59:30 notice Shut down
2009-jun-10 14:59:30 debug Journal "TplStore": Destroyed
-------------------------------------------------------------------------------

Comment 4 Alan Conway 2009-06-10 13:08:35 UTC

Created attachment 347216 [details]
Fix for this issue.

The patch fixes both sides: the broker handles channel 0 correctly and the python client no longer uses channel 0 by default.

Comment 5 Jan Sarenik 2009-06-11 07:13:47 UTC

Reproduced on RHEL5-i386 <=> RHEL5-i386

Verified on qpidc-15 build in these scenarios

  First in cluster | Second
   RHEL5-i386      |  RHEL5-i386
   RHEL5-x86_64    |  RHEL5-i386
   RHEL5-i386      |  RHEL5-x86_64
   RHEL5-x86_64    |  RHEL5-x86_64

Comment 7 errata-xmlrpc 2009-06-12 17:39:31 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html

Note You need to log in before you can comment on or make changes to this bug.